Ontology for Machine Learning Complexity

Every so often a really good article lands on my plate at just the right time.   This happened recently when I came across Ideas on Interpreting Machine Learning.

question on my mind these days is this —how to quantify Machine Learning complexity?  I am not referring to the complexity of the problem ML is helping solve, rather the complexity of the ML solution itself.

The software development life cycle is racing to adopt Machine Learning tools and practices. It remains unclear, however, how to manage and quantify ML complexity in ways similar to how IT pros have been doing it for some time.

Unnecessarily complex solutions create an array of problems in the software development life cycle such as demotivating teams, increasing costs and decreasing quality of the outputs produced.   In the case of ML, complexity is also a big deal because pending regulation will increasingly demand ML model understanding and transparency:

The law will also effectively create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for computer scientists to take the lead in designing algorithms and evaluation frameworks which avoid discrimination and enable explanation.

The current hype surrounding AI and ML is only widening the gap between ML deployments and the understanding stakeholders have regarding the models deployed.

In Ideas on Interpreting Machine Learning, the authors propose a number of tools to improve the “interpretability” of ML algorithms and models they produce, including an ontology to describe a model’s complexity as follows:

  • Linear, monotonic functions – describe ML models created by linear regression algorithms; probably the most interpretable class of models.  For a change in any given independent variable, the response function changes at a defined rate, in only one direction, and at a magnitude represented by a readily available coefficient.
  • Nonlinear, monotonic functions – There is no single coefficient that represents the change in the response function induced by a change in a single independent variable.  Nonlinear, monotonic functions do always change in one direction as a single input variable changes.
  • Non-linear, non-monotonic functions – Most machine learning algorithms create nonlinear, non-monotonic response functions.   This class of functions is the most difficult to interpret, as they can change in a positive and negative direction and at a varying rate for any change in an independent variable.

Traditional techniques to measure IT complexity do not apply in Machine Learning. This is because Machine Learning flips traditional software engineering on its head, putting more focus on engineering the right input data instead of code.

This new ontology provides one way to describe ML model complexity and this is good news for maturing the role of ML in the software development life cycle.


Knowledge adds certainty to our doing things right and doing the right things.   We are knowledge workers after all,  so we should strive to become really good at learning and how it can make us more knowledgeable. 

Learning is the information economy’s production function.  Unlike the capital or land intensive production functions of other economies, this one requires primary inputs of data and information along with a healthy dose of human motivation, curiosity and time.  People learn by doing, reading, writing, listening and observing.  And as Drucker suggested, we should know which works best for us. 

When we learn, we apply our existing knowledge to extract relevance and purpose from data.  Data are just facts about things and our knowledge helps assemble these facts in a way that helps us know more.

Machine learning works in much the same way.  In supervised machine learning, a data scientist uses her domain-specific knowledge to extract a set of features from tons of data representing an observed phenomenon.  Each row of features is associated to a desired output and an algorithm generates a model to optimally map the two.   The model is then capable of predicting the output value for a new row of features.  The more  rows come in, the more this new data can be used to retrain the model to make it more accurate.  More accuracy means knowledge is gained and the model has learned. 

Machine Learning’s Complexity Problem

I’ve been experimenting with machine learning lately.   For someone who started writing code in the early 90’s and witnessed firsthand the explosion of the web and all the software engineering practices that evolved from it, I find amazing how machine learning  flips traditional software engineering on its head.

Traditional software engineering taught us to divide and conquer, minimize coupling, maximize cohesion while artfully abstracting concepts in the problem domain to produce functional and maintainable code in the solution domain.  Our favorite static code analysis tools helped keep our code (and its complexity) in check.

Similarly, traditional software architecture taught us to worry less about code complexity and more about architectural complexity for it had farther reaching consequences. Architectural complexity had the potential to negatively impact teams, businesses and customers alike, not to mention all phases of the software development lifecycle.

Yes, this was the good ol’ world of traditional software engineering.

And machine learning flips this world on its head.  Instead of writing code,  the engineering team collects tons of input and output data that characterize the problem at hand.  Instead of carving component boundaries on concepts artfully abstracted from the problem domain,  engineers experiment with mathematics to unearth boundaries from the data directly.

And this is where machine learning’s complexity problem begins.  Training data sets rarely derive from a single cohesive set.  They instead depend on a number of  other data sets and algorithms.     Although the final training data set may be neatly organized as a large table of features and targets, the number of underlying data dependencies required to support this can be quite dramatic.

Traditional software engineering became really good at refactoring away dependencies in static code and system architectures in order to tame the complexity beast, the challenge now is to do the same for data dependencies in machine learning systems.

In conclusion, the paper “Machine Learning: The High Interest Credit Card of Technical Debt” summarized this and a number of other ML complexity challenges nicely:

No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything.”

Tagged ,

AI 3.0

AI, Machine Learning (ML) and Deep Learning (DL) are all the hype these days, and for good reason. By now we know progress in AI accelerated over the past decade thanks to a convergence of factors including Big Data and compute power. And results are impressive as a recent Economist article highlights:

In February 2015 DeepMind published a paper in Nature describing a reinforcement-learning system capable of learning to play 49 classic Atari video games, using just the on-screen pixels and the game score as inputs, with its output connected to a virtual controller. The system learned to play them all from scratch and achieved human-level performance or better in 29 of them.

Over the next two years, many businesses will continue ramping up their ML/DL initiatives with the hope of improving every aspect of their business performance. These companies will follow a path similar to Major League Baseball’s pursuit of sabermetrics, or Wall Street’s appetite for algorithmic trading.

I think at some point in 2018, the latest wave of AI hype will peak and begin receding thereafter. Ongoing issues with model accuracy, as well as high costs required to operate less-than-stellar model performance will be two of the primary reasons behind this. I also believe decision-makers will feel increasingly vulnerable as AI effectively detaches them from understanding and refining the theories underlying their business performance.

This will usher in a new period of enlightenment  where companies adjust their be-all-end-all expectations of AI in favor of empowering their people to effectively coexist with AI.  This will be good news for workers too as Tyler Cowen suggests in Average is Over:

As intelligent-analysis machines become more powerful and more commonplace, the most obvious and direct beneficiaries will be the humans who are adept at working with computers and with related devices for communications and information processing. If a laborer can augment the value of a major tech improvement by even a small bit, she will likely earn well. This imbalance in technological growth will have some surprising implications. The key questions will be: Are you good at working with intelligent machines or not? If the answer is yes, then your wage and labor market prospects are likely to be cheery. If the answer is no, but you have an unusual ability to spot, recruit, and direct those who work well with computers, then the contemporary world will make you rich.

A short lesson on data

You can do a lot of things on the Internet but whatever you do requires data.  The Internet has a lot of data.  Some say roughly 1,000,000,000,000,000,000,000,000 GB of data are available but no one really knows the exact amount.

GB is short for gigabytes, or one billion bytes.  We measure the size of data in bytes.  One byte is equivalent to eight bits.  You generally need between one and four of these bytes to represent a single letter in the alphabet, or twenty of them for the average English word.

Data needs to be stored and retrieved.  Hard drives were designed for exactly this purpose.  Twenty years ago it cost $259 to store one GB of data on a hard drive. Today it costs just a few pennies even if people prefer storing their data directly on the Internet.  This serves them well considering their phones and tablets don’t even have traditional hard drives.

Data needs to be uploaded and downloaded on the Internet.  And this requires a network connection that moves data to and from a computer and the Internet. Five years from now the average Internet user will be transfering 37GB a year through their internet connections.

Data can be stolen.  Before the Internet, a thief needed to be physically close to a computer in order to steal the data stored on its hard drive.  After computers started connecting to the Internet, thieves could now steel data from anywhere in the world.

You’re probably wondering why someone would steal another person’s data. A person steals another person’s data in order to hurt them.   Data is simply a recording of all the things people think and do in their lives.  Many times what a person thinks or does should remain private or should only be shared with a very small group of trusted people. This is our basic right but when a person steals our data they violate this right.   Sometimes the people we know and want to share our data with also violate our privacy when they accidentally make it available to someone else. 

Data also helps us make better decisions.   You are probably wondering “I make good decisions without needing data from the Internet,” and this is correct.  You rely on your judgement and intuition to make good decisions and this is how it should be.   With data however, you have a way of learning more about the facts that describe, explain or even predict a problem you are facing. When you take this data and apply some fancy math to it, you have a powerful new tool to help you answer tough questions.    And this is why the Internet is so powerful, it not only contains a lot of data (remember all the bytes I mentioned earlier), but it has the fancy math tools that help people make better decisions.

I know you are probably wondering, “If the Internet helps people make  decisions, can it also decide for itself?”  This is a great question, and makes for a great story another day. 

The lesson here is that data is very important in our lives and this will only increase as you grow older.  Learn to protect your data so it can only be seen by those people you trust.   Always rely on your judgement and intuition to make good decisions and learn how to use data to make better and more informed decisions. 

(for my two young daughters)

Tagged ,


Simple is that ‘horse that left the barn’ but remains in your line of sight.  Chase her down and the problem is solved.  Apply best practices in horse management to ensure it doesn’t happen again.

Complicated is trickier.  It’s that feeling of being ‘caught between a rock and a hard place.’  You’re aware of being unaware of how to get out. Nevertheless, you are confident that good survival practices will help navigate you out of this mess soon enough.

Complexity grows each second you ‘grab the bull by the horns.’  Your best bet is to try things, getting a sense for what works and repeat. If you succeed in taming the wild beast, remember to reflect on your experience, teasing out useful knowledge that will help you repeat this success in the future.


We hear these idioms everyday because we encounter these types of problems everyday. Understanding the category of problem we are solving is the first step towards effectively solving it.

The really interesting part is to get better at ordering complex problems, thereby diminishing their complexity, or increasing the order of complicated ones so they become simpler.



A recent New York Times article suggests that Rome is falling apart.  This may come as no surprise considering similar articles have suggested the same over the past decade.  It is nonetheless strange news for a city and country that are blessed with fourty-eight million tourist to its stunning countrysides, beautiful cities, and cultural treasures each year.

My wife and I moved to Rome in 2009.   I spent many summers in Italy as a child, but it is by living here that I realized the city is dazzling almost entirely through the preservation and promotion of its past success. This has the net effect of pushing aside the practical everyday needs of Romans.

In Thomas Friedman’s much talked about book, The World is Flat, a comparison is made to cities as collaborative platforms for social and economic progress.  As an IT professional, I can tell you that a technology platform’s value hinges on what it offers being fit-for-purpose and how it offers this being fit-for-use.  Rome is prioritizing the preservation of storied relics over the renewal of everyday services.  This makes the city a better fit for the purposes of its visitors than those of its residents.

Similarly, no resident here will resist the notion that Rome is increasingly unfit-for-use.  There are many complex reasons for this. A video that went viral last week may have a simple one.  In it, bus driver Christian Rosso attributes the recent chaos in the city’s public transportation system to the large quantity of city buses parked in the garage awaiting maintenance.  In other words, they are unfit for use and this has exhausted the patience of Rome’s visitors and residents alike.

My point here is not to fuel the nytimes article and its ensuing firestorm.  The fact of the matter is that Rome is one of the nicest cities you’ll ever visit.  However, If the city is to become more valuable to current and future generations of tourists and residents, the mayor and his team need to propose services that satisfy the changing needs of its 21st century residents.  They need to equally ensure these services work and can be relied upon throughout the year by residents and non-residents alike.

Value is an atomic all or nothing proposition.  Uncovering it requires the wisdom and leadership to understand purpose, as well as the knowledge and management to ensure its uninterrupted availability.

Tagged , ,


A thought hit me the other day which I will briefly share with you in this post.  Read through today’s popular management journals and magazines and you’ll find numerous references to culture and its unique ability to influence quality of work and organizational performance.  Take for instance  Clayton Christensen’s brilliant portrayal in the widely popular article “How will you measure your life?“:

 “Culture, in compelling but unspoken ways, dictates the proven, acceptable methods by which members of the group address recurrent problems. And culture defines the priority given to different types of problems.   It can be a powerful management tool.​”

What hasn’t been clear, at least to me, are the characteristics of culture in achieving this influence.

If you agree with Clayton –  that culture is a mechanism by which individuals prioritize and select ways to tackle recurring problems, then consider that this mechanism is inherently instinctive, not unlike the seemingly innate behaviors that characterize an individual’s unique talents.   So while culture and talent are conceptually different (e.g. the former underpinned by values, the latter by genetics), they both appear to promote instinctive and recurrent behaviors.  It is these same behaviors that can have a huge influence (i.e. positive or negative) on quality and performance.[1]

*** Notes ***

[1] In their book, First Break All the Rules, Marcus Buckingham and Curt Coffman suggest that a focus on talent offers the advantages of a strengths-based hiring approach.  One of these advantages is employee engagement, and as Tom Rath, Author of StrengthsFinder 2.0, points out, “People who have the opportunity to focus on their strengths every day are six times as likely to be engaged in their jobs and more than three times as likely to report having and excellent quality life in general”.

Tagged , , ,


Some food for thought on product or service quality.  Deming defined it in relation to the value offered to the customer. Drucker had a similar customer-centric view when he said  “Quality is not what the supplier put in, but what the customer gets out and is willing to pay for”.  (Note: Deming did define a manufacturing centric view of quality in his effort divided by cost equation.)

Moving past traditional management science circles, I like Robert Pirsig’s philosophy on quality from his classic book, Zen and the Art of Motorcycle Maintenance.   Here, Pirsig presents quality not as a thing, but “as an event” – representing a path to discovery of the “right facts” between the creator and her creation.   When you apply his definition to knowledge work it begs the question – do we understand how quality is affected by the relationship between a worker and the tools and materials with which she works?  Consider the elevated joy and satisfaction an individual derives from programming in Ruby vs. Visual Basic, for example.  Returning to the definition proposed by both Deming and Drucker, it’s easy to imagine how Pirsig’s interpretation of quality is the event that leads to creation of customer value.

So there you have it, two perspectives on quality, one is customer centric, the other is manufacturing centric, both highly dependent on one another for the reasons Seth Godin presents in his quality of design vs. quality of manufacture post.

Can we therefore agree that in knowledge work, more important than our collective understanding of the characteristics that constitute ‘high-quality’ is the understanding of the subtle factors that allow these characteristics to emerge?

Tagged , , , ,

Agile #scale

A few years ago I would have had difficulty mentioning failure and Agile software development in the same breadth. On the heals of the ever popular manifesto and effective practices such as XP and Scrum, Agile adoption grew, and the more it grew, the more software developers and managers felt empowered to beat the the long and dismal history of software failure.

Photo: Scott Olson/Getty Images

Now there’s increasing evidence to suggest that Agile software development and Agile management practices have finally earned the interest and attention of larger organizations,  the same organizations who usually find comfort hiring from a pool of 400,000 management professionals carrying the widely recognized PMP industry certification.  This certification, (known as the Project Management Professional), is a leading certification for project managers offered by the Project Management Institute (PMI).  The certification’s popularity makes the PMI very influential in establishing culture and practice of management within larger organizations.  The PMI has now turned their attention to Agile.

But in the spirit of Agile’s promotion of continuous feedback and adjustment, I’ve encountered quite a few challenges scaling agile in larger organizations.  Some of these challenges are structural, others cultural, and so it’s time for me to adjust my own tune on the realities that come from adopting Agile in such environments.

The following are four challenges confronting Agile practitioners in larger organizations:

  1. System of reporting” differs from the “System of production” – The corporate hierarchy (i.e. “system of reporting”) renders difficult the self-organization and a cross-functional focus required for successful Agile teams.
  2. Financial cycles differ from management cycles which differ from project cyclesExcellent article by Jim Highsmith on the temporal challenges an iterative approach brings when the organization thinks and acts on a quarterly and yearly basis.
  3. Definition of done –  Procurement, budgeting and yearly reviews all necessitate a formal understanding of when the project will finish. You may even reach consensus on a scope and date to appease management but your first release plan that extends past the terms of this definition may present problems.
  4. Rewarding individuals over teams – Yearly corporate performance review programs focus on the individual yet Agile makes no provisions for this kind of evaluation, in fact it can be detrimental (pdf) to the team’s trust and self-organization.

What challenges have you encountered scaling Agile in larger organizations? How are you overcoming them?