Tag Archives: complexity

Machine Learning’s Complexity Problem

I’ve been experimenting with machine learning lately.   For someone who started writing code in the early 90’s and witnessed firsthand the explosion of the web and all the software engineering practices that evolved from it, I find amazing how machine learning  flips traditional software engineering on its head.

Traditional software engineering taught us to divide and conquer, minimize coupling, maximize cohesion while artfully abstracting concepts in the problem domain to produce functional and maintainable code in the solution domain.  Our favorite static code analysis tools helped keep our code (and its complexity) in check.

Similarly, traditional software architecture taught us to worry less about code complexity and more about architectural complexity for it had farther reaching consequences. Architectural complexity had the potential to negatively impact teams, businesses and customers alike, not to mention all phases of the software development lifecycle.

Yes, this was the good ol’ world of traditional software engineering.

And machine learning flips this world on its head.  Instead of writing code,  the engineering team collects tons of input and output data that characterize the problem at hand.  Instead of carving component boundaries on concepts artfully abstracted from the problem domain,  engineers experiment with mathematics to unearth boundaries from the data directly.

And this is where machine learning’s complexity problem begins.  Training data sets rarely derive from a single cohesive set.  They instead depend on a number of  other data sets and algorithms.     Although the final training data set may be neatly organized as a large table of features and targets, the number of underlying data dependencies required to support this can be quite dramatic.

Traditional software engineering became really good at refactoring away dependencies in static code and system architectures in order to tame the complexity beast, the challenge now is to do the same for data dependencies in machine learning systems.

In conclusion, the paper “Machine Learning: The High Interest Credit Card of Technical Debt” summarized this and a number of other ML complexity challenges nicely:

No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything.”

Tagged ,


Simple is that ‘horse that left the barn’ but remains in your line of sight.  Chase her down and the problem is solved.  Apply best practices in horse management to ensure it doesn’t happen again.

Complicated is trickier.  It’s that feeling of being ‘caught between a rock and a hard place.’  You’re aware of being unaware of how to get out. Nevertheless, you are confident that good survival practices will help navigate you out of this mess soon enough.

Complexity grows each second you ‘grab the bull by the horns.’  Your best bet is to try things, getting a sense for what works and repeat. If you succeed in taming the wild beast, remember to reflect on your experience, teasing out useful knowledge that will help you repeat this success in the future.


We hear these idioms everyday because we encounter these types of problems everyday. Understanding the category of problem we are solving is the first step towards effectively solving it.

The really interesting part is to get better at ordering complex problems, thereby diminishing their complexity, or increasing the order of complicated ones so they become simpler.