When I was studying Computer Science at the university, one of the principles I learned about was DRY (Don’t Repeat Yourself). Don’t get me wrong, I don’t disagree with this principle as a general rule. However, there are a few scenarios where I think it hurts more than helps.
This blog post is all about personal opinions on the DRY topic. Other people may think differently, and that’s totally fine. If I can at least put my ideas together in a way they make some sense, I’ll consider my goal accomplished.
DRY in tests
I strongly believe tests is one of the exceptions where we should be OK with code duplication. When I’m building actual application code, there are places where I would never allow DRY. For example, if I’m designing an integration with a 3rd party, and their HTTP API lives at a certain URL, more often than not I’ll extract that to a constant, and reuse it across the service. The same goes with, for example, a hard-coded pagination limit that will be the same for all requests to that API. I’ll define a (private) constant and reuse it across the service.
When it comes to the specs, however, I feel inclined to not reusing the constants (or in general, the code) I have written for the application itself. There are a few reasons for that.
TDD
I use TDD as much as possible, so by the time I’m writing the application code, the test is already there. Hence, I cannot reuse application code for the specs while writing the specs, since application code doesn’t exist yet.
One of the many advantages of using TDD is that the definition of the specs is not biased by the way the application code has been written. If I’m writing the expectations upfront, and I can be explicit about what exactly I expect from the method (without any weird abstractions that would make the code less comprehensive), I feel it will be harder to make mistakes.
Tests as documentation
Tests are documentation, and help readers understand how a class or method will behave. If we’re making constant references to the application code, the reader is forced to see application code and specs side by side to understand the whole picture. I personally like to read clear, comprehensive documentation, that doesn’t force me to do 10 cognitive redirections to get what a method is doing.
For example, I very much prefer to read the following spec:
describe "animal rescue HTTP client" do
# 200 lines of specs for different methods
describe "list animals" do
it "makes an HTTP GET request to the pets endpoint applying filters and limit conditions" do
service.call
expect(HTTParty).to have_received(:get).with("https://animal-rescue.example/pets?type=dog&limit=10")
end
end
end
than its functionally-equivalent:
describe "animal rescue HTTP client" do
# 200 lines of specs for different methods
describe "list animals" do
let(:path) { "pets" }
let(:default_limit) { 10 }
it "makes an HTTP GET request to the pets endpoint applying filters and limit conditions" do
service.call
expect(HTTParty).to have_received(:get)
.with("#{described_class::BASE_URL}/#{path}?type=#{Animals::DOG}&limit=#{default_limit}")
end
end
end
The former is explicit about what service.call
will do. The second forces me to see what’s the value of described_class::BASE_URL
, what Animals::DOG
is (will it be dog, dogs, a code representing dogs, …) and default_limit
. In these cases, even if those are constants meant to be reused in the application code, I strongly prefer to see the explicit dogs, 10 and URL.
But, what if, in the future, we change the default limit from 10 to something else? In that case, the cost of doing a find & replace on this file is negligible, so I’m not acquiring a big technical debt. But even if it took me 15 minutes to tell apart that 10 from other 10s in the file, I think it’s still worth it. Plus, If someone uses the DRY argument on that, we may very well use the YAGNI one. How likely it is that we stop using dogs for dogs, and 10
to limit the number of results?
Not just a rewrite
Sometimes, I encounter specs that are a rewritten version of the actual application code. For example, I may find a method:
def total_amount(base, extras)
base * 1.5 + extras ** 2
end
and the specs:
describe ".total_amount"
it { expect(descibed_class.total_amount(1, 1)).to eq(1 * 1.5 + 1 ** 2) }
it { expect(descibed_class.total_amount(2, 3)).to eq(2 * 1.5 + 3 ** 2) }
end
I am very sure that the test will pass, as it’s essentially the same code as the application code itself. What I don’t know is whether that number is what I was actually expecting. What would happen if, for example, I meant (base * 1.5 + extras) ** 2
instead? What if I assumed a different order of precedence in the operators? I would have made the same mistake in the application code and the specs, and I would have a buggy code with full test coverage.
I would find much more convenient to do:
describe ".total_amount"
it { expect(descibed_class.total_amount(1, 1)).to eq(2.5) }
it { expect(descibed_class.total_amount(2, 3)).to eq(12.0) }
end
This is another example of how tests can be influenced by the application (for bad) if they are written after the subject code is implemented.
DRY in abstractions
I am sure this principle I didn’t invent myself, but it’s been in my head for so long I do not know where I actually got it from. Or how much of it comes from the original source and how much I added myself.
Often times, I find myself working on a piece of code that is somewhat similar to another piece of code in the codebase. My first instinct: DRY, reuse, create an abstraction to cover both, etc. This idea works well in many cases, but specially if we’re new to a domain, we may not be able to identify what is a coincidence and what is inherent to all cases.
A few times during my career I’ve worked on feature A, then on feature B, refactored them into a common abstraction, and then needing feature C which just doesn’t fit the abstraction I created for A and B. That forces me to either undo my work, or modify the abstraction to forcefully fit C. Then D comes, and we have to modify the abstraction again for this one little detail. How bizarre, uh? Our abstraction is not actually abstract, as it heavily depends on the whims of the different implementations. At the end of the day, I have a piece of code that is very complex and hard to understand, and I often regret trying to get the abstraction too early.
What I tend to do in cases like this one is allowing a certain level of duplication. Of course, it depends on the case, and mostly on how familiar I am with the domain we’re talking about. If the uncertainty is high, I am OK with having multiple copies (up to ~3-4) of similar code for different integrations. Once I’ve gone through all integrations, I can have a better understanding of what things they have in common (if any), and can work on a refactor to reduce code duplication. But I can only know what will be the shape of that abstraction once I’m familiar with the domain, which I get through implementing all integrations, each with its particular details. Maybe we can even find that integrations A, B, C and D are quite similar and can be refactored, and E is completely different and will not fit any abstraction we define for the rest. That is fine, and that’s something we could not have foreseen if we made the abstraction right after implementing A, while B is being conceived.
I hope you find this blog post interesting. Happy coding!