Lanning Blog

DRY is about consistency

First, I apologize for writing yet another™ blog about DRY. Much of what I write here is described in more depth by the talented Vladimir Khorikov. Hopefully I can add a different perspective.

DRY was first described in The Pragmatic Programmer. In the recently re-released version, the authors had admitted it is one of the most misunderstood programming principles around. Incorrect interpretations have likely done millions of dollars worth of damage. The authors of The Art of Unix Programming go so far as to say DRY is better phrased as SPOT (Single Point of Truth).

For those new to DRY; DRY is an acronym for "Don't Repeat Yourself". Meaning if you have a duplicate piece of information, whether it be a variable, a function, or a clump of data, putting it in a single unambiguous location can reduce complexity.

In the case of a variable, this may mean moving a variable that's within two functions into a single place.

function foo() {
    const int velocityMphMax = 40;
    // .. Code here ..
}

function bar() {
    const int velocityMphMax = 40;
    // .. Code here ..
}
// After refactoring:
const int velocityMphMax = 40;

function foo() {
   // .. Code here ..
}

function bar() {
   // .. Code here ..
}

In the case of a function, it would mean extracting some code block that's duplicated in foo and bar into its own function. In the case of a clump of data, it might mean creating a new class and refactoring the functions to take that class as a parameter.

Most people think this is good because if a bug is fixed in one place, it is propagated throughout the system. That's true, but we'll discuss the deeper value later.

Here's where many programmers get dry wrong.

  1. Confusing coincidental duplication with essential duplication. Coincidental duplication is when a bit of code happens to share something related with another bit of code. For example, if the max size of an image is 640x640 a naïve programmer might delete the const int heightMax = 640;¹ variable because const int widthMax = 640; has the same value! This is DRY done wrong. It complicates the codebase because maintainers cannot tell if this is intentional, as-in, the max height must be equal to the max width (in which case sizeMax might be a better name), or the original author simply misunderstood DRY. This can happen with functions and objects as well. For brevity, I will omit a function example. This is unfortunately done with objects, very, very often and it looks something like this-
class CustomerOrder {
  Date OrderDate;
  string CustomerName;
  string CustomerAddress;
  int AmountInCents;
  // .. More properties here ..
}

class BillingOrder {
  Date OrderDate;
  string AccountName;
  string AccountAddress;
  int AmountInCents;
  // .. More properties here ..
}

Hey, these two classes share a few similar looking properties! Let's merge them together into a class called Order and have some properties be null if it's a customer order and some classes be null if it's a billing order. No! Don't do this! It's DRY done wrong, and bound to cause headaches in the future, especially as the contexts diverge more and more. Ok I won't merge them together, but can I have a base class called Order and have them inherit from it? Please don't do this either! It results in the same problems and avoiding inheritance is best left to another blog.

  1. Getting DRY right but being too overzealous about it. This is a matter of taste, but some developers are prone to compressing code too much, even if they're doing it correctly. A little bit of duplication is sometimes more readable and maintainable than a ton of DRY-ed up code. This is why things like the Rule of Three exist. And many blogs about why small functions are bad. We won't get into that here.

Now that we've covered the essentials, here's the deeper value of DRY: consistency. If you ask many devs about important principles, consistency will rank high on the list. And there's a good reason for that. It is hard to express exactly why consistency is so important, and devs will vary on their response to how important it is, but it is universally agreed upon as "a good thing". Fred Brooks (famous for The Mythical Man Month), in The Design of Design, called it one of the most important principles of design. But here's how DRY is related to consistency: if a codebase is DRY there is one way to do things. There is not some function with minor differences duplicated throughout five different files, there's exactly one place.

The code is compressed in a lossless way. Most developers recognize less LOC in a codebase is (generally) a good sign and this is one of the reasons why. This frees up the maintainers mind. They do not have to see if there's any odd nuance between the duplicated code blocks. They do not have to choose between the n different duplicated blocks, hoping they chose the right one. There is only "One Way".

DRY is strongly linked to consistency, and consistency is about reducing entropy.

tl;dr DRY reduces entropy. It all flows back to information theory.


¹ Notice I'm using widthMax instead of maxWidth? I encourage you to try it out, it can result in greater symmetry throughout the code base, and the important part of the variable name comes first. Others might argue that maxWidth reads more like natural English though.

const int speedMphAverage;
const int velocityMphAverage;
const int speedMphMax;
const int velocityMphMax;

(Source: Code Complete 2nd Edition)