In this article I explain how I built a gem that facilitates the synchronization of columns between different models, facilitating certain database maintenance processes and improving data consistency in denormalized databases.


While studying databases at the university, one of the principles I learned was data normalization, where a relational database is structured to reduce data redundancy. Over the years, I learned there are some good reasons to keep data denormalized like boosting performance at the cost of risking data integrity.

It is also common to see redundant data while a database refactor is in progress. For example, if we are moving a column from table A to table B, the column may exist, in sync, in both A and B until the application(s) involved stop using the deprecated column. Keeping these columns in sync introduces a performance overhead and extra complexity, as either the database (preferably) or the application needs to make sure the data remains consistent. The shorter this period, the better.

While the scenario depicted above is intentional and controlled, we sometimes find applications where the same piece of information was stored inadvertently in two different tables, by two different teams. This will certainly not happen in projects where a handful of developers are involved, or in teams where domain boundaries are very clear. But some (most?) companies do not live in Wonderland, and people in the team come and go. Suddenly you realize your application is reading similar information from multiple models. And even though not ideal, that’s fine as long as the business can move on. These mistakes usually don’t bring the company down immediately, but increasing the technical debt may do if it grows too big.

Keeping columns in sync

In the process of paying this technical debt, or during a structured, planned database refactor, we sometimes need to keep certain columns in sync until one of them is no longer used and can be removed. Let’s say we are storing the country of our customers in companies.country as well as in subscriptions.country_code, assuming a 1:1 relation between companies and their subscription to our product.

Maintaining consistency necessarily involves two steps:

  1. An initial synchronization. There are likely records where the value of the country in one of the tables does not match the value in the other table. There is no general rule to know which is the right one: business decisions and possibly manual work will be needed to determine which is the right value for every customer.
  2. A mechanism to maintain the data in sync after the initial synchronization.

One of the ways to solve this redundancy is by using database triggers. We can configure the database to update companies.country whenever subscriptions.country_code is modified, and vice-versa. To do that, we need:

  • A function that updates companies.country with the value from subscriptions.country_code.
  • A function that updates subscriptions.country_code with the value from companies.country.
  • A trigger to execute the first function when a record is updated in the subscriptions table.
  • A trigger to execute the second function when a record is updated in the companies table.

Solving this problem with Rails

When I was considering applying this solution to one of my projects, I found it was everything but easy. It was too much work for a simple task. I thought this would be a good opportunity to investigate this challenge and hopefully build a tool that the community can use.

The first blocker I encountered is that Rails does not offer built-in support to create functions and triggers. I was very happy to find the F(x) gem, which solves this problem out of the box. Not only does it provide an easy way to define or redefine functions and triggers. It updates the db/schema.rb file to reflect the functions that are defined in the database already. Without using this gem you could write the migration in SQL to define a function or trigger, but that would not be reflected in the schema file, reducing its visibility (nobody is going to search through all migration files to find out if there is a function defined by any chance). I am very grateful the authors took the time to make this contribution to the community.

I started designing this gem from the perspective of how I would like to use it. I thought it would be so convenient to write a migration like:

class SyncColumns < ActiveRecord::Migration[7.1]
  def change
    sync_columns(Company => :country, Subscription => :country_code)
  end
end

It took me some time to define the sync_columns method and include it in the ActiveRecord::Migration module, but once I had it I felt very satisfied. I wrote some tests to make sure that both columns were kept in sync no matter which one was modified first. I started writing this article and stopped when I realized there was a big problem: objects were updated in the database, but that update was never reflected in objects already loaded in Ruby memory.

I was committed to fixing this scenario in pure TDD style. I wrote several tests to make sure one of the objects was updated in memory whenever the other one was modified (both via ActiveRecord methods as well as normal attribute assignment). I made extensive use of metaprogramming to understand which associations needed to be used, and which methods needed to be overridden. I was finally able to find a solution that made the tests pass and was not too sketchy.

My contribution to the community

I have just published the column_sync gem in version 0.2.0. We need to follow two steps:

  1. Create the database migration
  2. Indicate in the models which attributes need to be propagated to associated models

The Readme contains some examples that illustrate how to use it:

# db/migrate/123456789_sync_columns.rb
class SyncColumns < ActiveRecord::Migration[7.1]
  def up
    sync_columns(Subscription => :country_code, Company => :country)
  end

  def down
    unsync_columns(Subscription => :country_code, Company => :country)
  end
end

# app/models/company
class Company < ApplicationRecord
  include ColumnSync::Model
  
  has_one :subscription

  sync_column :country, to: :subscription, column: :country_code
end

# app/models/subscription
class Subscription < ApplicationRecord
  include ColumnSync::Model

  belongs_to :company

  sync_column :country_code, to: :company, column: :country
end

The configuration above is enough to keep the two columns in sync:

# Create some sample data
Company.create!(country: "es")
Subscription.create!(country_code: "es", company: Company.first)

# The attribute is updated when a new value is assigned with country=
company = Company.first
company.country = "fr"
company.subscription.country_code
# => "fr"

company.subscription.country_code = "ca"
company.country
# => "ca"

# The attribute is also updated when the value is updated with using the
# regular ActiveRecord update
company = Company.first
company.update(country: "it")
company.subscription.country_code
# => "it"

company.subscription.update(country_code: "ma")
company.country
# => "ma

Learnings and next steps

I was not very familiar with the definition of functions and triggers in Rails applications, and this exercise has been very valuable to face this challenge for the first time.

The F(x) gem is already a great discovery itself. Once again, I want to thank the authors, from which I have learned a lot. Inspired by this gem, this is the first time I have defined a new migration command, opening a new world of opportunities.

As I mentioned above, the gem is not production-ready yet. It has not been battle-tested in real-world scenarios yet, and it would be irresponsible from my side to give that impression. I plan to work on new features and fixes until it is ready for general use. If you decide to use it in your project I will be very happy, but I have to warn you to use it with caution. All feedback and contributions are very welcome, of course.

Some of the improvements I am considering include (and may never be implemented):

  • Providing new migrations to override one of the columns with the contents of the other one. This will ease the initial step whenever we know upfront that one of the columns already contains correct values, and the other one has some discrepancies that need to be fixed.
  • Modifying the functions and triggers to work not only on updates but also when a new record is created.
  • Supporting 1:N associations (currently limited to 1:1).
  • Making the migration reversible so we don’t need sync_columns and unsync_column.

I hope you found this article useful. Happy coding!