Lines of Code

If you want to know how complex a piece of software is, how can you tell? A simple approach is to count lines of code (LOC). Lines of Code is just a count of the number of lines of source code, excluding comment lines and white-space.

As soon I say this at work, someone will pipe up that it would be better to count semi-colons or get into a discussion about how bracketing style can affect the count. My experience is that bracketing style doesn’t make much difference, maybe 10%. Semi-colon count can be quite different, but is it anymore meaningful?

There are many more sophisticated measures of complexity. For example McCabe’s cyclometric complexity, which counts the number of paths though the code. And Halstead’s complexity measures, which measures the number of operators and operands in the code.
However I recently read about an empirical study which shows that all of these complexity measures are highly correlated with lines of code. The conclusion is “Whether based on program structure or textual properties, the metrics do not provide more information than simply ‘weighing’ the code by counting the number of lines.”

A couple of rules of thumb. Norris’ Number is 1,500 lines: “The average amount of code an untrained programmer can write before he or she hits a wall.” From my experience at about 10,000 lines of code you have to start introducing some big architecture into the system to partition the code. Windows XP is estimated to have 45,000,000 lines of code.

It is worth stressing that lines of code, or any of these other metrics, measuring how big the program is. This is an indicator of how difficult it will be to learn, maintain or change the software. They do not measure well how much the software “does”, so they are not a good indicator of programmer productivity or program functionality.