Index

Practice 5, TBD The second thing, and my favourite, is to simply deploy more often. How can you achieve that? The software engineering answer to this question is to practice trunk-based development, and in my experience, that works quite well for all kinds of data applications as well.

However, practicing trunk-based development is actually quite hard, because you’ll need how to do branching by abstraction (the best way to do it in a data context) and to have a team able to practice it. But if you are able to implement TBD, you’ll be able to increase your deployment frequency by an order of magnitude.

(3) Decrease your time to restore The time it takes to recover from any unplanned outages. Sadly, in my experience this is measured in days, not minutes or hours for data applications.

If you catch yourself saying "The dashboard will work tomorrow” is basically saying "we cannot restore in les than 24h, and it might takes 48h just as likely”.

The key problem in data applications with time to restore is the data. If you’re running one big data sync once a day that takes 8 hours, and that breaks, the time to restore is going to be huge, no matter the origin of the problem.

Luckily, especially going to smaller batches will already bring down your time to restore by a lot. If you don’t want to go to smaller batches or it isn’t reasonable in your case, there is one completely underused technique that you need to know.

Practice 6, Roll backs Rolling back to older versions of a software component is pretty common. Rolling back to an old version of data less so. The most common implementation of "rolling back” is to spin up a backup of your database.

So if you’re running a nightly backup of your database, simply make sure you can spin that one up and replace your production database with it within a couple of minutes.

If you’re on snowflake, it might make sense to always have an today/yesterday version of your data stored, such that you can swap them in a minute if something breaks.

If you’re on a data lake, thanks to data versioning solutions you’ll be able to implement this in an instant.

Whether you’re on a data warehouse or on a data lake, the general practice that enables roll backs is called "functional data engineering”. In functional data engineering rolling back is a simple as setting the "latest data timestamp” one step back, without the need to spin up pieces of infrastructure.

(4) Decrease lead time The final metric you want to drive down is lead time, the time it takes from starting to code to having your code in production.

While the previous practices alreas on the automation and neither on supporting local development.

There are a lot of things you could do, but in my opinion the most important three are the following.

Practice 7, have a development environment A confession, I am not coding on my laptop anymore. I keep everything I do inside codespaces, inside dockerized development environments. Why? Because I do switch between different projects and programming languages quite often. That "switching” takes some time, I always need to familiarize myself with the given environment.

Dockerized development environments make this 10 times easier, and things like devcontainers even containerise my VSCode setup.