Code Entropy

While preparing my talk at Smidig 2008 I kept thinking about the second law of thermodynamics.

Given two snippets of code, A and B, that have exactly the same external behaviour. If an expert programmer is more likely to change A to B than B to A, then snippet A has higher code entropy than snippet B.

Let us consider a very simple example:

{ int a=3; while (a<9) { ... ; a++; } } // snippet A

and

{ for (int a=3; a<9; a++) { ... } } // snippet B

The external behaviour for these two code snippets are exactly the same. However, most programmers would agree that snippet A is better rewritten into snippet B. So, in this example, during a refactoring session, it is likely that someone will change A into B, but unlikely that someone will change B into A. Hence snippet A has higher code entropy than snippet B.

Now, extend this idea into larger functions, classes, modules, applications, software design and architecture. Can entropy be used to describe the state of a codebase?

4 Responses to Code Entropy

  1. Neat idea. Code entropy seems to be very much like what others call “technical debt.” It would be interesting to see if the concepts could be combined/contrasted.

  2. Kjetil V. says:

    Entropy, as in ‘likelihood of being changed’. It seems to me that the concept is related to that of idiomatic code. The for loop in B has become an idiom because it is superior: the counter variable is more restricted in scope and cannot be confusingly re-used. And idioms are, by definition, exactly that: What most programmers would do.

  3. In terms of a composite measure of ‘mess’ you might want to check out the idea of code toxicity described by Erik Doernenburg from ThoughtWorks:

    http://erik.doernenburg.com/2008/11/how-toxic-is-your-code/

  4. Balog Pal says:

    The snippets are really NOT identical.
    If you have ‘continue’ anywhere in the … part they will behave differently.

    A “rewrite” shall take that into account, and the “natural” formulation is also dependent on what is actually done, what is the role of a, how it is changed, etc.

    The for() form is fit when the only modification of a is that a++ and it acts like an iteration. However it could be some phase/state marker too, that keeps changing inside, then while() is likely better.

    As we write code for particular purpose, not in general, too stripped snippets are not really useful. ;-)