HaskellYesod

Data Migration in Yesod

Posted on by 

Yesod has a nice feature to define database models. It uses the persistent library to migrate your database to a state which fits your defined application models. This is true for very new project setups. With time, your project will evolve and change. For this case, there is no nice solution in Yesod. For instance, if you add a column which is mandatory, Yesod is not able to migrate since it does not know what to do with existing data. In this blog post, we will present one possible migration path at the application start.

So let's start and define a simple model in Yesod's config/models file:

Product
  uuid Text
  title Text
  createdAt UTCTime default=now()
  UniqueProductUuid uuid
  Primary uuid

We create a very simple product model in a shop data structure (or something similar. I was just tired to use the blog example over again) Now. We assume the project changes over time, and we need a Category model to our project:

Category
  uuid Text
  title Text
  UniqueCategoryUuid uuid
  Primary uuid

Great! Some time passes by, we add categories, we add products, application is deployed and running. Our users are able to search for categories and products. Now, we decide to add a categoryId to a product (for what reason ever, maybe your client decided to, or your product owner came up with a story/requirement like this). At this point, our default migrations would get stuck. The field categoryId is mandatory, but we don't know how to fix the existing product models.

Here is how we can approach the problem: Step 1: Create an SQL script which we will execute to migrate the data Step 2: Write the changes we made in our database. Hence, for the next startup, we don't execute the same SQL script again.

Since the second step is our migration trick, we will beginn with this. Let's create a file db_migrations.sql in the config folder with the following content:

CREATE TABLE IF NOT EXISTS public.db_migrations
(
    file text NOT NULL,
    created_at timestamp with time zone NOT NULL default now(),
    CONSTRAINT db_migrations_unique_file UNIQUE (file)
)
WITH (
    OIDS = FALSE
);

As you can see, we create a new table if it does not exist. The structure is very simple and clear. We just add a file column and a created date. The file will indicate which files we already executed. And the created at field will tell us when this script was executed (just for information and easier debugging). We can easily check for a file in our migrations database. If this file has already an entry, we don't execute this again. Straight forward and simple.

What about step 1, though? We still need to run migrations for existing models. In this case we can introduce a new column categoryId and define it mandatory on the Product model like this:

Product
  uuid Text
  title Text
  createdAt UTCTime default=now()
  categoryId CategoryId
  UniqueProductUuid uuid
  Primary uuid

If we do so, we will end up with a stuck migration and our application won't run. Let's change our model as planned, but this time add a folder migrations to the config folder and place a file with the name 1-category-id-in-products.sql:

DO $$
BEGIN
    IF NOT (SELECT EXISTS (SELECT 1 FROM db_migrations WHERE file=?)) THEN
      BEGIN
          BEGIN
              ALTER TABLE product ADD COLUMN category_id text NOT NULL default 'empty';
              UPDATE product SET category_id=(SELECT uuid FROM category WHERE title='Default category' LIMIT 1);
          EXCEPTION
              WHEN duplicate_column THEN RAISE NOTICE 'Column category_id already exists in table product.';
          END;
      END;

      INSERT INTO db_migrations VALUES (?);
    END IF;
END $$

Uhm, ok, this looks maybe a bit complicated. Let's go through the code. DO and BEGIN statements at the beginning of the file are specific to postgres and will enable us to use IF statements. We can the check if our file, which is being executed already exists in our migrations database. If the entry does not exists, we go on and alter our product table to have a category_id column, which is per default filled with a useless string 'empty'. Afterwards, we select a default category and update all rows in the product table to use this category as its reference. This will satisfy our constraint that the categoryId is mandatory in the product model. Finally, if everything worked out as expected, we add the executed script to our migrations database so we do not execute this migration again at startup.

So far, we created models and SQL files. But we still miss something in our code to execute those files for us. The place to add this kind of code is in the Application.hs file (if you have set up the project from the default Yesod template). We need to locate the runMigration function with its parameter migrateAll. This is the point our model files are parsed and transformed into the correct database schema. Hence, we need to execute something before this happens (otherwise it would be too late). The persistent library offers a function called rawExecute which we can run at startup of our application. So, let's try to add the following code to our Application.hs file right before the runMigration function:

dbMigrationsSql <- readFile "config/db_migrations.sql"
runLoggingT (runSqlPool (rawExecute (TE.decodeUtf8 dbMigrationsSql) []) pool ) logFunc

In the first line we read the migrations database creation SQL-script from file. The second line will run our SQL content inside our SQL connection pool, which is already created at this point.