Patterns for database migration

4/11/2013

Over the years, I have seen a number of patterns for creating and maintaining database schemas. Some of them work for some situations, and many of them don't. Here's a non-comprehensive list:

  1. Manual. When a change to the schema is needed, someone needs to log into the server and update the schema by hand. Maybe the required changes are somewhere in a release notes document, or somewhere else.
  2. Using an ORM tool to generate and update the schema when the application starts.
  3. Keeping a folder of SQL scripts in the project, and having some kind of custom process during application deployment or startup inspecting the folder, finding new scripts, and executing them.
  4. A combination of letting the ORM tool update the schema, and then running SQL scripts.
  5. Using a database migration framework.

Some of these styles work just fine, even those that sound like they would not:

Manual updates

Manual works fine for things that you hardly ever need to deploy or where the schema only changes once a decade. This blog's database schema is 'updated' this way, i.e. has yet to change from version 1.

Auto-generating the schema

Auto-generating the database works fine if the database is of no use after the application has shut down, and you might as well drop the entire schema and re-generate on startup. If all state of the application is in effect volatile, e.g. a long-running batch process that uses a database while running, dumping and re-creating will work fine. Otherwise, you'll only get partial updates, as the ORM framework cannot do anything more complicated than creating new nullable columns, tables and foreign keys. If you need anything more that that, you're ot of luck and need to ssh into prod.

SQL update scripts

This is usually the first thing an organization comes up with when they notice they need schema updates, or need to add initialization data when the application is deployed. This can work, but is usually implemented poorly: When a script does not run successfully, what does that mean for the deployment of the application? In each and every case I have seen so far, the developers would log a warning, and continue with the next script.

This is as wrong as could be. Before long, what you have are failed update scripts, sometimes leading to problems immediately, sometimes later. You get an inconsistent database state, especially if you have multiple environments in which different sets of scripts succeeded or failed. This is a path to disaster.

What a failed update script should mean is a hard deployment failure. The first failed statement must fail the deployment or startup of the application, and corrective action must be taken right away. You cannot allow different database schemas in different environments. Ever.

Auto-generated schemas with update scripts

Gladly, I have only seen this pattern once. The application would start up, run Hibernate2DDL in update mode to get the schema in line with the current version, and then evaluate a directory of SQL update scripts and run all the new ones.

This leads to a world of pain soon. The main conceptual problem is that, during each application deployment, the ORM schema update runs first, making all the updates it can do, and the SQL update scripts run afterwards, regardless in which order the changes were made. This means that there is an element of randomness in the order changes are executed. Consider these changes:

The changes will work fine on the test server, but fail in production: newColumn is never created there, so the custom scripts both fail in a cascade of fuckups. Great.

So, whatever you do, don't try the auto-DDL + custom script combination, it's a sure recipe for disaster.

Using a database migration framework

I have experience with two database migration frameworks:

Both are easy to integrate into Spring applications. The main difference between the two is their attitude towards down migrations (more on that later). I'd usually lean towards Liquibase for new projects.

The way both work is that you hook them into your application startup and let them do their magic before any ORM framework initializes. They both expect migrations to be located in a specific (configurable) folder, from where they are picked up and executed in order if they are new.

From a developer perspective, it is important to keep in mind that scripts cannot ever be changed once deployed (how would that work?). If a migration makes a mistake, it needs to be corrected with another migration. Both frameworks check the migrations' checksums to prohibit changes.

When a migration fails, the process is aborted, an error is logged, and the application does not start. This is a tremendously important safeguard against cascades of broken scripts.

Down migrations

Where the two frameworks differ is their attitude towards down migrations. A down migration is when you deploy an older version of the application than was previously deployed, for example when a regression is found, and the last known-good version is installed again.

Liquibase offers a large number of XML commands for common DDL changes, like adding columns and indexes. For these types of changes, it can automatically provide the appropriate down migrations as well, which makes the XML notation quite powerful. For custom SQL, the user can provide a custom down script to reverse the changes done by the script. This is completely optional, however.

Flyway's opinion towards down migrations is: "We're not doing that.", or, more specifically: "From a database perspective, every migration is a forward migration." If you require down migration support, Flyway cannot help you, and does not intend to. Maybe rightfully so: If you ever were to use a down migration, would it work? Did you write that custom script, and more importantly: did you test it properly? Does your build pipeline include a step where down migrations are tested? If not, you don't have down migrations, and Flyway does not try to hide that fact from you.

If you still think that you need down migrations, you need to use Liquibase, and you need to put in the work to make your migrations run in the other direction as well, and test the whole process in an automated fashion. If you don't, the down path won't work when you need it most.

Transactional schema changes

Some database management systems have the ability to wrap schema changes in a transaction. This way, a migration can only fail or succeed as a whole, and you can never get an undefined state due to a partially executed migration. This is so massively helpful that it may influence your choice of database. For example, I would always chose PostgreSQL over MySQL for this reason alone.

Using a database user with restricted access rights

If the database user your application uses does not have the right to change the DDL, don't despair. Just configure an extra datasource specifically for the migration framework which is not otherwise used, but which has the required rights to change the database schema. (This is often mandated for security reasons, so when someone manages to take over your application, all they can do is... well, delete all your data.)

To sum up, there are working database migration frameworks out there, and I recommend using them when you have a database that changes. By the time the migration reaches production, it should have executed on each of the test servers and be known to work.

And if you absolutely positively need to roll your own, make sure a broken migration halts the deployment. Do not allow failed migrations to go undetected.

And make your life easy with a database system that supports transactional DDL, if you can. Things that fail in a transactional way are much more easily fixed.

Comments