Oftentimes, we make bad decisions while developing software. Depending on the nature, the cost of reverting these decisions may not be affordable, and you will have to live with that forever.
Today I’d like to cover a mistake occasionally made by Rails developers who, motivated by the excitement of having an application up and running, forgot to read the Polymorphic associations chapter in the Rails guides. Most of the articles in my blog come from personal experiences, so I have to admit I made this mistake as well. And that’s fine, because if I had not made this mistake, probably I would not have investigated how to fix it, and I would not have learned along the way.
Making the mistake
Let’s consider a dummy application named Rater. This application keeps a collection of users and products, as well as ratings associated with these records. Additionally, claims can be associated with a user rate, including a message that describes the claim.
Following a very naive approach, let’s create the models User
, Product
, UserRate
, ProductRate
and UserClaim
:
rails new --database=postgresql rater
cd rater
rails g model User name:string
rails g model Product name:string
rails g model UserRate user:references rate:integer
rails g model ProductRate product:references rate:integer
rails g model UserClaim user_rate:references message:string
rake db:setup
Yay! We’ve got our application up and running. Let’s update our db/seeds.rb file to populate some data:
# db/seeds
USERS = 1000
PRODUCTS = 1000
MAX_RATES = 100
UserClaim.delete_all
UserRate.delete_all
ProductRate.delete_all
User.delete_all
Product.delete_all
users = []
user_rates = []
products = []
product_rates = []
user_claims = []
(1..USERS).each { |i| users << { name: "User #{i}" } }
User.insert_all(users)
User.pluck(:id).each do |user_id|
rand(MAX_RATES).times { user_rates << { rate: rand(10), user_id: } }
end
UserRate.insert_all(user_rates)
UserRate.select(:id).order('RANDOM()').limit(USERS / 10).pluck(:id).each do |user_rate_id|
user_claims << { user_rate_id:, message: 'This claim is so unfair!' }
end
UserClaim.insert_all(user_claims)
(1..PRODUCTS).each { |i| products << { name: "Product #{i}" } }
Product.insert_all(products)
Product.pluck(:id).each do |product_id|
rand(MAX_RATES).times { product_rates << { rate: rand(10), product_id: } }
end
ProductRate.insert_all(product_rates)
The code above is basically creating 1000 users and 1000 products, and adding around 50 ratings to each one of them. After running rake db:seed
, we have 49728 records in the user_rates
table and 49595 records in the product_rates
table. We also have 100 records in the user_claims
table, referencing 100 different user rates.
For the sake of simplicity, we will assume the has_many
and belongs_to
methods are properly used in each one of the models, according to the database structure we have just created.
While this solution works, we usually find ourselves making the same changes to UserRate
and ProductRate
models. We naturally ask ourselves if there was a way to unify this into a single model. We start researching and find a link to the Rails Guides in some random post in StackOverflow, and we regret not reading the full guide before jumping straight into coding.
The mistake was made, and we wish we could go back and change the data structure for the ratings. But our tables already have thousands (millions!) of records, and we definitely don’t want to lose that information.
Intermediate steps
Right after reading the guide (this time we read all the way down to the STI section, just in case), a natural idea comes to mind: Let’s create a Rate
model, associated with User
and Product
as a polymorphic association, and let’s move all records from user_rates
and product_rates
into the rates
table, specifying in each case the right rateable_id
and rateable_type
:
Even though the overall idea is good, there are some details that need to be taken into account. Our user_claims.user_rate_id
is referencing user_rates.id
, so we need to preserve the id
(primary key) when we move the user_rates
into the rates
table. Otherwise, the rate referenced by the user claim may not be correct, or may not even exist!
However, if we just insert all user_rates
and product_rates
into the rates
table, we will have a primary key collision, as user rates and product rates with the same ID (integer, sequentially incremented) will exist in both tables.
We can certainly overcome this situation by making the rates
table use a composite primary key, using the rateable_id
and the rateable_type
columns for that. However, while that is a feasible solution, it introduces additional complications we choose not to cover in this article (lack of unique identifiers for URLs, composite foreign keys, etc.). We are aiming to make the rates
table look the same as if we knew about polymorphism since the beginning.
Before we can insert all records into the rates
table, we need to transform the primary keys of the *_rates
tables (as well as the columns in other tables that are referencing these values) into UUIDs. I wrote an article specifically on that topic, so we won’t spend too much time on the details.
Making use of the gem webdack-uuid_migration
(version 1.5 or above!), we transform these columns from ID to UUID:
require 'webdack/uuid_migration/helpers'
class UuidMigration < ActiveRecord::Migration[7.0]
def change
enable_extension "uuid-ossp"
primary_key_and_all_references_to_uuid :product_rates, seed: SecureRandom.uuid
primary_key_and_all_references_to_uuid :user_rates, seed: SecureRandom.uuid
end
end
The snippet above achieves two goals:
- Migrates the column
product_rates.id
from integer to UUID using one seed. - Migrates the column
user_rates.id
and the columnuser_claims.user_rate_id
from integer to UUID using a different seed.
Using a different seed for both operations is crucial, as otherwise two identical integer keys would be transformed into two identical UUIDs, leading to the same primary key collision analyzed before.
Merging tables
After running the migration, we are ready to create a rates
table, prepared for a polymorphic association:
class CreatePolymorphicTable < ActiveRecord::Migration[7.0]
def change
create_table :rates, id: :uuid, default: 'gen_random_uuid()' do |t|
t.references :rateable, polymorphic: true
t.integer :rate
t.timestamps
end
add_index :rates, %i[rateable_type rateable_id], if_not_exists: true
end
end
We also need to create a new model Rate
, which is referred to as rateable
due to the polymorphic nature of the relation.
class Rate < ApplicationRecord
belongs_to :rateable, polymorphic: true
end
Now we have an empty rates
table, and we need to move all records from user_rates
and product_rates
. We will write another migration that inserts all records from this table into the rates
table, making sure the rateable_type
is properly set. This migration may take some time to execute depending on the size of the original tables:
class PolymorphicDataMigration < ActiveRecord::Migration[7.0]
def up
command = <<-SQL
INSERT INTO rates(id, rateable_id, rateable_type, rate, created_at, updated_at)
SELECT id, user_id, 'User', rate, created_at, updated_at
FROM user_rates;
INSERT INTO rates(id, rateable_id, rateable_type, rate, created_at, updated_at)
SELECT id, product_id, 'Product', rate, created_at, updated_at
FROM product_rates;
SQL
ActiveRecord::Base.connection.execute(command)
end
def down
ActiveRecord::Base.connection.execute("TRUNCATE TABLE rates")
end
end
Right after running this migration, we have:
3.1.2 :001 > UserRate.count
UserRate Count (5.8ms) SELECT COUNT(*) FROM "user_rates"
=> 49367
3.1.2 :002 > ProductRate.count
ProductRate Count (12.0ms) SELECT COUNT(*) FROM "product_rates"
=> 49078
3.1.2 :003 > Rate.count
Rate Count (19.5ms) SELECT COUNT(*) FROM "rates"
=> 98445
All records have been successfully inserted into the rates table
. However, the user_claims
table still has a foreign key pointing at the user_rates
table. We need to update the foreign key to point at the just-created rates
table. The foreign key will still be valid, as the same seed has been used during the transformation of the user_rates.id
and user_claims.user_rate_id
columns. We should also rename the user_rate_id
column, as it will no longer be referencing the old user_rates
table:
class UpdateForeignKeys < ActiveRecord::Migration[7.0]
def change
remove_foreign_key :user_claims, :user_rates
rename_column :user_claims, :user_rate_id, :rate_id
add_foreign_key :user_claims, :rates
end
end
The last step is to remove the tables we no longer need. We are making this migration reversible just in case we have to rebuild the original tables from the rates
table:
class Cleanup < ActiveRecord::Migration[7.0]
def up
drop_table :user_rates
drop_table :product_rates
end
def down
create_table :user_rates, id: :uuid, default: 'gen_random_uuid()' do |t|
t.references :user, null: false, foreign_key: true
t.integer :rate
t.timestamps
end
create_table :product_rates, id: :uuid, default: 'gen_random_uuid()' do |t|
t.references :product, null: false, foreign_key: true
t.integer :rate
t.timestamps
end
command = <<-SQL
INSERT INTO user_rates(id, user_id, rate, created_at, updated_at)
SELECT id, rateable_id, rate, created_at, updated_at
FROM rates
where rateable_type = 'User';
INSERT INTO product_rates(id, product_id, rate, created_at, updated_at)
SELECT id, rateable_id, rate, created_at, updated_at
FROM rates
where rateable_type = 'Product';
SQL
ActiveRecord::Base.connection.execute(command)
end
end
Summary
After running these migrations and updating the models properly (out of the scope of this article), we will have achieved what we wanted at the beginning of the article: unifying UserRate
and ProductRate
into a single model and table, while keeping the existing relations in place.
Additional work will probably be needed in some of the models in order to make this solution bulletproof. For example, our model UserClaim
should reflect that it’s rate
is scoped to those with “User” rateable_id
:
class UserClaim < ApplicationRecord
belongs_to :rate, -> { where(rateable_type: 'User') }
end
Now that our Rate
model is using UUIDs as primary keys, it may be worth specifying some implicit_order_column.
These kinds of adjustments in the models need to be reviewed carefully to make sure the refactored application makes sense.
Last but not least, the database migrations in this example have been executed with a few thousand records. In a real production environment, certain migrations can take significantly longer to run, and may even block some operations on the database (especially inserts and updates). You should carefully analyze the situation and analyze how this will impact users of your application.