Equational coreasoning

Equational coreasoning

Designing software in language that can't let you reason about your code is like building a house without architect. Fixing something afterward will be very hard. Very hard because you will not be able to measure the impact of a change.

We can use a very simple definition for equational reasoning: replacing equivalent expressions (the same value). Let's start with a short, rather uninteresting piece of code:

public static int Dummy1()
{
    var x = 1;
    var y = x++;
    var z = y + y;
    return z;
}

Try to think a little bit and ask yourself whether the following code will return the same value:

public static int Dummy2()
{
    var x = 1;
    var z = x++ + x++;
    return z;
}

Let's continue with this one:

public static int Dummy3()
{
    var x = 1;
    var y = x++;
    var z = y + x++;
    return z;
}

Don't scratch your head, you are trying to understand something that is very difficult to follow. In fact, we are not used to this kind of imperative thinking. Things are even worse when we mix pure code with side-effects. Here it is:

private static int _k;

public static int Dummy4(int x)
{
    var r = x * _k;
    _k++;
    return r;
}

public static int Dummy5()
{
    var x = 5;
    var y = Dummy4(x);
    var z = Dummy4(x);
    return y + z;
}

public static void Dummy6()
{
    var d = Dummy5();
    var dd = Dummy5();
    Console.WriteLine(dd);
    Console.WriteLine(_k);
}

public static void Main(string[] args)
{
    Dummy6();
    /*
        25
        4
    */
}

If we keep focusing on the Dummy5 function and try to reason about the code, we may judge that $y = z$. If we do remove $z$ to use $y$ only, the result of calling Dummy5 will be totally different (directly affecting Dummy6 and so on...).

private static int _k;

public static int Dummy4(int x)
{
    _k++;
    return x * _k;
}

public static int Dummy5()
{
    var y = Dummy4(5);
    return y + y;
}

public static void Dummy6()
{
    var d = Dummy5();
    var dd = Dummy5();
    Console.WriteLine(dd);
    Console.WriteLine(_k);
}

public static void Main(string[] args)
{
    Dummy6();
    /*
        20
        2
    */
}

Remember, we want to replace equivalent expressions. Unfortunately, we are not doing maths, in our case, we have to be sure that the expressions are free of side-effects. Let's say it, Dummy4 is not a function (in the mathematical sense) as it is not pure.

Pretty disastrous. Since we are not controlling the side-effects, we are forced to check the complete call tree. Now you realize that even if 90% of your code is pure, having a single side-effect in a leaf function infects the whole tree, resulting in an impure, top level function.

The big problems arise when you maintain a software that is relying on multiple libraries and a complex stack of tightly coupled objects. One day, your manager tell you to fix a bug. Reproducing this bug is easy: given a set of input comming from the client interface, clicking on the same button twice does not produce the same output. Enjoy having to debug and find the underlying side-effect.

Fortunately, escaping this situation is possible. It is a matter of thinking differently. We can't avoid side-effects as our program would be useless without them. Interacting with the real world is mandatory for our company: database access, console logging, API calls and so on... The ultimate solution is simple: identifiying the pure/impure functions and composing them correctly:

  • pure (business logic relying on data transformations only)
  • impure (access to real world stuff such as the database read/write)

We have to remember that combining a pure function with an impure one yield an impure function (infection). Doing this composition at the leaf of our program call tree is a bad idea as we are going to fall into the problem we just described. You know it, we should apply this composition at the top level. Let's rewrite our example:

// Our function is now pure as we are not using a global mutable state
public static int Dummy4(int x, int k)
{
    return x * k;
}

public static int Dummy5(int k)
{
    var x = 5;
    var y = Dummy4(x, k);
    var z = Dummy4(x, k + 1);
    return y + z;
}

private static int _k;
public static void Dummy6()
{
    var d = Dummy5(_k);
    _k += 2;
    var dd = Dummy5(_k);
    _k += 2;
    Console.WriteLine(dd);
    Console.WriteLine(_k);
}

public static void Main(string[] args)
{
    Dummy6();
    /*
        25
        4
    */
}

Our Dummy5 is now very clear. Stating that $z = y$ doesn't make sense any more, everything is explicit. Furthemore, following the program is much easier as the very bad mutable state has been extracted to the top level.

Generalizing this example to a more complex program is not a hard task. Imagine having multiple side-effects depending on each others, mixed with pure code, splitted accross different layers of the application. Evil software. Maintainance nightmare.

Finally, in pure functional programming languages such as Haskell, the distinction between the two types of functions is easy explicit as we have monads (something I will be talking about in a future article).

Thanks for reading.


https://en.wikipedia.org/wiki/Side_effect_(computer_science)
https://en.wikipedia.org/wiki/Referential_transparency
https://en.wikipedia.org/wiki/Monad_(functional_programming)

Show Comments