Jon William Crain

Principles of Software Engineering: Minimizing Complexity

Welcome to Principles of Software Engineering, where I, Jon Crain, self-taught programmer and ex-Facebook employee write about Software Engineering.

Today I'm going to be writing about minimizing complexity in software. This is, I think, the single most important concept in software. Why? Complexity increases cost. It increases the time needed to debug a system, which is the single most time intensive part of the development process. It increases the time needed to read code. Code is read more often than it is written, and a good measure of what level a software engineer is at is how clear and simple their code is. Complexity causes exponential code surface area growth.

First, let's define and flesh out complexity. There is inherent complexity, which is how complex your code must be to solve the problem. We can't minimize that, for a contrived example let's consider a Python program that averages a list of numbers without using the standard library. There are a bunch of ways to do this, but no matter what, you're going to have to sum all the numbers and then divide by how many numbers you have.

Then there is incidental complexity, this is what we can actually minimize. This is any overhead that doesn't actually solve the problem. In real world applications, I find this often arises in the form of too little abstraction, creating code that is complex because it is hard to easily find the critical section of code you want to modify. A good indicator of this is a function that is more than a few paragraphs of text long.

It can also be caused by too much abstraction, taking a simple problem and breaking it down into so many layers or imposing a framework on it that is non-intuitive. Code complexity of this type is often not due to the actual lines of code written, but how that code calls other code, that is the connection between pieces of code not the complexity of a given function. A good indicator of this is code that calls too many functions to complete a simple task, any code that calls between Class A and Class B and then back to Class A, or has unnecessary design patterns.

For example, say you have a simple GET method of a REST application, that has the job of taking an ID parameter and returning information about a single store item.

A way you could do this with too little abstraction would be, receive the request, parse out the parameter, do some business logic to check that the user has access to the item, retrieve the information from the database, and return a JSON response all in one function.

Another way you could do this with too much abstraction would be, receive the request, create a new ParameterParser object given the item, call the PermissionEnforcerFactory to create PermissionEnforcer object, which you then pass your ParamaterParser object into to get back a PermissionResponse object, which you deserialize and confirm, then make a GRPC call to your database query service which actually calls the database and gives you back your response, after which you make a REST call to your JSONSerializer service to finally get back your JSON response. We can see there is a big middle ground here, which is where we want to stay on the spectrum. It is hypothetically possible that the above paragraph could be justified if there was compelling need, what is never justifiable would be things like, the ParameterParser function calling back into the controller, creating a circular dependency. Using multiple protocols in the above paragraph is also harder to justify and a likely source of incidental complexity.

There is also potential for incidental complexity in the code itself, for example perhaps our JSON serializer is written to solve too broad a problem of serialization and can parse out a near infinite number of nested objects when it would be simpler to write code to parse up to a reasonable, configurable level of nested objects. You could also have the other problem of a serializer that only can handle strings and not numbers for example, which means a future developer has to patch on or hack a solution.

You can see, these decisions often are a spectrum, and there's not one right answer to how much abstraction or how broad a problem you should try to solve, but you should be able to justify the choices that you make. The goal should be, to make the actual complexity of the software you write approach the inherent complexity of the problem, while considering a myriad of other factors. The other factors may be performance (concurrency and matrices are fast but harder to reason about), compute vs storage vs network costs, organizational factors (how many Java developers do you have? How many can you hire? How many teams does your platform support?), testability, scalability (How granular should your microservices be?), operability (How long should you retain logs? Should you sample?), and many other factors which I'll write about in upcoming articles.

Enjoyed this article? Tip me on gofundme.