Oftentimes, we make bad decisions while developing software. Depending on the nature, the cost of reverting these decisions may not be affordable, and you will have to live with that forever.

Today I’d like to cover a mistake occasionally made by Rails developers who, motivated by the excitement of having an application up and running, forgot to read the Polymorphic associations chapter in the Rails guides. Most of the articles in my blog come from personal experiences, so I have to admit I made this mistake as well. And that’s fine, because if I had not made this mistake, probably I would not have investigated how to fix it, and I would not have learned along the way.

Making the mistake

Let’s consider a dummy application named Rater. This application keeps a collection of users and products, as well as ratings associated with these records. Additionally, claims can be associated with a user rate, including a message that describes the claim.

Initial database structure

Following a very naive approach, let’s create the models User , Product , UserRate, ProductRate and UserClaim:

rails new --database=postgresql rater
cd rater

rails g model User name:string
rails g model Product name:string

rails g model UserRate user:references rate:integer
rails g model ProductRate product:references rate:integer

rails g model UserClaim user_rate:references message:string

rake db:setup

Yay! We’ve got our application up and running. Let’s update our db/seeds.rb file to populate some data:

# db/seeds

USERS = 1000
PRODUCTS = 1000
MAX_RATES = 100

UserClaim.delete_all
UserRate.delete_all
ProductRate.delete_all
User.delete_all
Product.delete_all

users = []
user_rates = []

products = []
product_rates = []

user_claims = []

(1..USERS).each { |i| users << { name: "User #{i}" } }
User.insert_all(users)

User.pluck(:id).each do |user_id|
  rand(MAX_RATES).times { user_rates << { rate: rand(10), user_id: } }
end
UserRate.insert_all(user_rates)

UserRate.select(:id).order('RANDOM()').limit(USERS / 10).pluck(:id).each do |user_rate_id|
  user_claims << { user_rate_id:, message: 'This claim is so unfair!' }
end
UserClaim.insert_all(user_claims)

(1..PRODUCTS).each { |i| products << { name: "Product #{i}" } }
Product.insert_all(products)

Product.pluck(:id).each do |product_id|
  rand(MAX_RATES).times { product_rates << { rate: rand(10), product_id: } }
end

ProductRate.insert_all(product_rates)

The code above is basically creating 1000 users and 1000 products, and adding around 50 ratings to each one of them. After running rake db:seed, we have 49728 records in the user_rates table and 49595 records in the product_rates table. We also have 100 records in the user_claims table, referencing 100 different user rates.

For the sake of simplicity, we will assume the has_many and belongs_to methods are properly used in each one of the models, according to the database structure we have just created.

While this solution works, we usually find ourselves making the same changes to UserRate and ProductRate models. We naturally ask ourselves if there was a way to unify this into a single model. We start researching and find a link to the Rails Guides in some random post in StackOverflow, and we regret not reading the full guide before jumping straight into coding.

The mistake was made, and we wish we could go back and change the data structure for the ratings. But our tables already have thousands (millions!) of records, and we definitely don’t want to lose that information.

Intermediate steps

Right after reading the guide (this time we read all the way down to the STI section, just in case), a natural idea comes to mind: Let’s create a Rate model, associated with User and Product as a polymorphic association, and let’s move all records from user_rates and product_rates into the rates table, specifying in each case the right rateable_id and rateable_type:

Model structure after unifying all rates in a single model

Even though the overall idea is good, there are some details that need to be taken into account. Our user_claims.user_rate_id is referencing user_rates.id, so we need to preserve the id (primary key) when we move the user_rates into the rates table. Otherwise, the rate referenced by the user claim may not be correct, or may not even exist!

Original relation between user_claims and user_rates

The relation is broken as a result of rewriting the IDs of the user_rates without updating the user_claims

However, if we just insert all user_rates and product_rates into the rates table, we will have a primary key collision, as user rates and product rates with the same ID (integer, sequentially incremented) will exist in both tables.

Primary key collision

We can certainly overcome this situation by making the rates table use a composite primary key, using the rateable_id and the rateable_type columns for that. However, while that is a feasible solution, it introduces additional complications we choose not to cover in this article (lack of unique identifiers for URLs, composite foreign keys, etc.). We are aiming to make the rates table look the same as if we knew about polymorphism since the beginning.

Before we can insert all records into the rates table, we need to transform the primary keys of the *_rates tables (as well as the columns in other tables that are referencing these values) into UUIDs. I wrote an article specifically on that topic, so we won’t spend too much time on the details.

Making use of the gem webdack-uuid_migration (version 1.5 or above!), we transform these columns from ID to UUID:

require 'webdack/uuid_migration/helpers'

class UuidMigration < ActiveRecord::Migration[7.0]
  def change
    enable_extension "uuid-ossp"

    primary_key_and_all_references_to_uuid :product_rates, seed: SecureRandom.uuid
    primary_key_and_all_references_to_uuid :user_rates, seed: SecureRandom.uuid
  end
end

The snippet above achieves two goals:

  1. Migrates the column product_rates.id from integer to UUID using one seed.
  2. Migrates the column user_rates.id and the column user_claims.user_rate_id from integer to UUID using a different seed.

Using a different seed for both operations is crucial, as otherwise two identical integer keys would be transformed into two identical UUIDs, leading to the same primary key collision analyzed before.

Merging tables

After running the migration, we are ready to create a rates table, prepared for a polymorphic association:

class CreatePolymorphicTable < ActiveRecord::Migration[7.0]
  def change
    create_table :rates, id: :uuid, default: 'gen_random_uuid()' do |t|
      t.references :rateable, polymorphic: true
      t.integer :rate
      t.timestamps
    end

    add_index :rates, %i[rateable_type rateable_id], if_not_exists: true
  end
end

We also need to create a new model Rate, which is referred to as rateable due to the polymorphic nature of the relation.

class Rate < ApplicationRecord
  belongs_to :rateable, polymorphic: true
end

Now we have an empty rates table, and we need to move all records from user_rates and product_rates. We will write another migration that inserts all records from this table into the rates table, making sure the rateable_type is properly set. This migration may take some time to execute depending on the size of the original tables:

class PolymorphicDataMigration < ActiveRecord::Migration[7.0]
  def up
    command = <<-SQL
      INSERT INTO rates(id, rateable_id, rateable_type, rate, created_at, updated_at)
        SELECT id, user_id, 'User', rate, created_at, updated_at
        FROM user_rates;

      INSERT INTO rates(id, rateable_id, rateable_type, rate, created_at, updated_at)
        SELECT id, product_id, 'Product', rate, created_at, updated_at
        FROM product_rates;
    SQL
    ActiveRecord::Base.connection.execute(command)
  end

  def down
    ActiveRecord::Base.connection.execute("TRUNCATE TABLE rates")
  end
end

Right after running this migration, we have:

3.1.2 :001 > UserRate.count
  UserRate Count (5.8ms)  SELECT COUNT(*) FROM "user_rates"
 => 49367 
3.1.2 :002 > ProductRate.count
  ProductRate Count (12.0ms)  SELECT COUNT(*) FROM "product_rates"
 => 49078 
3.1.2 :003 > Rate.count
  Rate Count (19.5ms)  SELECT COUNT(*) FROM "rates"
 => 98445 

All records have been successfully inserted into the rates table. However, the user_claims table still has a foreign key pointing at the user_rates table. We need to update the foreign key to point at the just-created rates table. The foreign key will still be valid, as the same seed has been used during the transformation of the user_rates.id and user_claims.user_rate_id columns. We should also rename the user_rate_id column, as it will no longer be referencing the old user_rates table:

class UpdateForeignKeys < ActiveRecord::Migration[7.0]
  def change
    remove_foreign_key :user_claims, :user_rates
    rename_column :user_claims, :user_rate_id, :rate_id
    add_foreign_key :user_claims, :rates
  end
end

The last step is to remove the tables we no longer need. We are making this migration reversible just in case we have to rebuild the original tables from the rates table:

class Cleanup < ActiveRecord::Migration[7.0]
  def up
    drop_table :user_rates
    drop_table :product_rates
  end

  def down
    create_table :user_rates, id: :uuid, default: 'gen_random_uuid()' do |t|
      t.references :user, null: false, foreign_key: true
      t.integer :rate

      t.timestamps
    end

    create_table :product_rates, id: :uuid, default: 'gen_random_uuid()' do |t|
      t.references :product, null: false, foreign_key: true
      t.integer :rate

      t.timestamps
    end

    command = <<-SQL
      INSERT INTO user_rates(id, user_id, rate, created_at, updated_at)
        SELECT id, rateable_id, rate, created_at, updated_at
        FROM rates
        where rateable_type = 'User';

      INSERT INTO product_rates(id, product_id, rate, created_at, updated_at)
        SELECT id, rateable_id, rate, created_at, updated_at
        FROM rates
        where rateable_type = 'Product';
    SQL

    ActiveRecord::Base.connection.execute(command)
  end
end

Summary

After running these migrations and updating the models properly (out of the scope of this article), we will have achieved what we wanted at the beginning of the article: unifying UserRate and ProductRate into a single model and table, while keeping the existing relations in place.

Additional work will probably be needed in some of the models in order to make this solution bulletproof. For example, our model UserClaim should reflect that it’s rate is scoped to those with “User” rateable_id :

class UserClaim < ApplicationRecord
  belongs_to :rate, -> { where(rateable_type: 'User') }
end

Now that our Rate model is using UUIDs as primary keys, it may be worth specifying some implicit_order_column.

These kinds of adjustments in the models need to be reviewed carefully to make sure the refactored application makes sense.

Last but not least, the database migrations in this example have been executed with a few thousand records. In a real production environment, certain migrations can take significantly longer to run, and may even block some operations on the database (especially inserts and updates). You should carefully analyze the situation and analyze how this will impact users of your application.