ActiveRecord is, in my opinion, one of the most powerful tools bundled with Rails. Dealing with ruby objects and collections instead of plain rows and SQL queries is very convenient in most cases. The query interface is just wonderful. However, we sometimes find a conflict between what we would like to do in an object-oriented language and what the database demands to stay efficient.
In today’s article, we will tweak ActiveRecord to define a read-only attribute that is calculated as an aggregated function on another table.
Original scenario
Let’s define our pet project as a very simple invoice application. There is nothing super special about the setup, so we will not stop at the details. The database schema is very simple: we have an invoices
table and a payments
table where we store the payments associated with an invoice:
ActiveRecord::Schema[7.0].define(version: 2023_10_01_130449) do
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
create_table "invoices", force: :cascade do |t|
t.text "summary"
t.string "name", null: false
t.datetime "supplied_at", null: false
t.datetime "invoiced_at", null: false
t.integer "total_in_cents", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.check_constraint "total_in_cents > 0", name: "positive_amount"
end
create_table "payments", force: :cascade do |t|
t.bigint "invoice_id", null: false
t.integer "amount_in_cents", null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["invoice_id"], name: "index_payments_on_invoice_id"
end
add_foreign_key "payments", "invoices"
end
Our Invoice
model has_many :payments
and our Payment
model belongs_to :invoice
. Nothing fancy so far.
Defining a seeds file is very convenient to play around with our application without needing to create test data every time:
# db/seeds.rb
Payment.delete_all
Invoice.delete_all
100.times do |i|
invoice = Invoice.create!(
summary: 'Invoice summary',
name: "Customer #{i}",
total_in_cents: 100 * rand(50..200),
supplied_at: Time.zone.now,
invoiced_at: Time.zone.now
)
invoice.payments.create!(amount_in_cents: 30_00)
invoice.payments.create!(amount_in_cents: 20_00)
invoice.payments.create!(amount_in_cents: invoice.total_in_cents - 50_00)
invoice.payments.create!(amount_in_cents: 10_00) if (i % 7).zero?
end
We have also defined a method in our Invoice
model to find out its status. Four possible statuses are considered:
- paid when the sum of the payments matches the total amount.
- unpaid when there is no payment associated with the invoice.
- underpaid when the sum of the payments is less than the total amount.
- overpaid when the sum of the payments is more than the total amount.
Our Invoice#status
looks like this:
# app/models/invoice.rb
def status
case payments.sum(:amount_in_cents)
when total_in_cents
:paid
when 0
:unpaid
when 1...total_in_cents
:underpaid
else
:overpaid
end
end
And we can use it like:
3.2.0 :001 > Invoice.first.status
Invoice Load (0.4ms) SELECT "invoices".* FROM "invoices" ORDER BY "invoices"."id" ASC LIMIT $1 [["LIMIT", 1]]
Payment Sum (0.4ms) SELECT SUM("payments"."amount_in_cents") FROM "payments" WHERE "payments"."invoice_id" = $1 [["invoice_id", 1]]
=> "overpaid"
Refactoring target
The status
method certainly works, and that is the least we can ask of a method. We even have unit tests to make sure it works as expected. Everything is all right. However, after some years with the application running, we found the following piece of code was using an abnormal amount of memory:
def overpayments
Invoice.all.select{ |invoice| invoice.status == "overpaid" }.first(10)
end
This method selects the first ten invoices that are overpaid. However, it is loading all of them into memory just to take a few of them.
Having lost all context about how the model is built, my first instinct was to replace the select
filter with a where(status: :overpaid)
clause, and the first(10)
by a limit(10)
. This way, the database would take care of filtering the records for us, and the application would get exactly what it needs. However, I failed to realize that we are filtering the records in memory because the status of the invoice is evaluated in memory — it is not persisted in the table. We cannot apply the limit
clause to the query, as we don’t know how many records are going to be filtered out.
I had the strong feeling that this was not something we should be doing in memory. But I also didn’t have a straightforward way to solve this. A couple of ideas came to mind:
- Create a view that aggregates the payments associated with the invoice and returns the status. While this is the most straightforward, we would not be able to use our ActiveRecord models to query the view. Besides that, we don’t need the invoice status every time we load an object, so loading objects from the view by default would be overkill. We would also have some code duplication, as the status definition would be present in the ruby code as well as in the view definition.
- Dynamically define the
status
attribute on the query result by using something likeInvoice.select("*", "'foo' as status")
. TheInvoice
objects returned by this query will respond to thestatus
method just as if it were another column in the table. The status definition would still be duplicated in Ruby and SQL, though. - Pretend I did not see that code and move forward, which looked appealing for a split second.
None of these ideas solved the entire problem, so I decided to adapt the second one trying to remove that definition duplication.
Defining the status attribute with a scope
Since I wanted to evaluate the case-when structure in the database, I thought it would be a good idea to define a function using Pl/pgSQL. That would make the application code easier to understand, so I created the following migration:
class DefineInvoiceStatusFunction < ActiveRecord::Migration[7.0]
def up
command = <<~SQL
CREATE OR REPLACE FUNCTION invoice_status(total_in_cents integer, total_paid bigint) RETURNS varchar AS $$
BEGIN
RETURN CASE
WHEN total_paid = total_in_cents THEN 'paid'
WHEN total_paid = 0 THEN 'unpaid'
WHEN total_paid > total_in_cents THEN 'overpaid'
ELSE 'underpaid'
END;
END;
$$ LANGUAGE plpgsql;
SQL
ActiveRecord::Base.connection.execute(command)
end
def down
ActiveRecord::Base.connection.execute('DROP FUNCTION IF EXISTS invoice_status(integer, bigint);')
end
end
We can easily test our function with plain old SQL:
SELECT
invoice_status(10, 0),
invoice_status(10, 10),
invoice_status(10, 5),
invoice_status(10, 20)
-- unpaid, paid, underpaid, overpaid
The next step was defining a scope that defines a new attribute with the total amount paid per policy.
# app/models/invoice.rb
class Invoice < ApplicationRecord
PAYMENTS_SUBQUERY = Payment.group('invoice_id')
.select(:invoice_id, 'SUM(amount_in_cents) as total_paid')
.to_sql
.freeze
has_many :payments
scope :with_total_paid, lambda {
select('invoices.*')
.joins("LEFT OUTER JOIN (#{PAYMENTS_SUBQUERY}) subq ON invoices.id = subq.invoice_id")
.select('subq.total_paid as total_paid')
}
# ...
end
I used a subquery to get the total amount paid by invoice. Then I used a LEFT OUTER JOIN
with the invoices
table to include this new column in the result. It is important to notice how this scope selects all columns in the invoices table (to create the regular Invoice
object) and then selects an additional value from the subquery and returns it as total_paid
. This attribute can be accessed as any other method in our model:
3.2.0 :001 > Invoice.with_total_paid.limit(2).map(&:total_paid)
=> [6800, 12600]
We are still not getting our payment status, but we are getting closer. With this scope available, we can easily define a new scope that evaluates invoices_status
to give us what we are expecting. I had to remove the original status
definition, as this method is now defined dynamically when the result is returned. I also included an optional parameter to filter by one or more statuses:
# app/models/invoice.rb
scope :with_status, lambda { |statuses = nil|
query = with_total_paid.select('invoice_status(total_in_cents, total_paid) as status')
if statuses
query = query.where(
'invoice_status(total_in_cents, total_paid) IN (?)',
Array(statuses)
)
end
query
}
This new scope can be used like:
3.2.0 :001 > Invoice.with_status.limit(2).map(&:status)
=> ["overpaid", "paid"]
3.2.0 :002 > Invoice.with_status(:overpaid).limit(2).map(&:status)
=> ["overpaid", "overpaid"]
3.2.0 :003 > Invoice.with_status([:overpaid, :paid]).limit(2).map(&:status)
=> ["overpaid", "paid"]
This solution solved one of our issues and allowed us to do the filter and limit in the database. The status is retrieved on demand (when the scope is used) in the same query as the other fields in the model. However, we have not solved the code duplication issue yet, as the status
instance method is no longer defined in the model.
Reusing the status scope from an instance method
We have a way to load the status
when we are retrieving a collection of objects, but our codebase may be loading the invoice objects from somewhere else (relations, direct load, etc.). We need to provide a replacement for the original status
method in a way that we reuse the scope we have defined for the collection.
The first approach I tried was just defining the method as:
# app/models/invoice.rb
def status
self.class.with_status.find_by(id:).status
end
However, we get into an infinite loop due to recursion. ActiveRecord will only define the status
method introduced in the scope if the method is not defined already. Thus, by explicitly defining the status
method in the model we broke the dynamic status
method defined by the scope.
We have one last trick though. We need the status
method to be defined for objects returned by the scope, and we want the status
method to be defined differently for objects loaded without the scope. There is a little trick we can use to achieve our goal:
# app/models/invoice.rb
class Invoice
# ...
private
def method_missing(*args)
return super unless args.first == :status
return super if respond_to?(:status)
self.class.with_status.find(id).status
end
We are using the method_missing method to intercept the status
method calls on objects that don’t have this method defined. ActiveRecord uses a similar approach to define the status
method dynamically, so we need to find out whether the method is going to be defined for this object by ActiveRecord or not. If it does not, we should define our implementation.
We can now use the status
method as a regular instance method:
3.2.0 :001 > Invoice.first.status
=> "overpaid"
Final comments
If I had to implement from scratch a feature to get a list of invoices filtered by status, I would probably create a custom query object for that specific use case and not mess up with the existing model. However, the approach discussed in this article can be useful in those cases where the method has been defined for ages and is used all over the place.
The status-related parts of the invoices model (scopes and status
instance method) may be refactored to a separate module to hide implementation details, which will enable us to easily define multiple aggregated attributes in the same model.
Happy coding!