Category Archives: Technology

Machine Learning’s Complexity Problem

I’ve been experimenting with machine learning lately.   For someone who started writing code in the early 90’s and witnessed firsthand the explosion of the web and all the software engineering practices that evolved from it, I find amazing how machine learning  flips traditional software engineering on its head.

Traditional software engineering taught us to divide and conquer, minimize coupling, maximize cohesion while artfully abstracting concepts in the problem domain to produce functional and maintainable code in the solution domain.  Our favorite static code analysis tools helped keep our code (and its complexity) in check.

Similarly, traditional software architecture taught us to worry less about code complexity and more about architectural complexity for it had farther reaching consequences. Architectural complexity had the potential to negatively impact teams, businesses and customers alike, not to mention all phases of the software development lifecycle.

Yes, this was the good ol’ world of traditional software engineering.

And machine learning flips this world on its head.  Instead of writing code,  the engineering team collects tons of input and output data that characterize the problem at hand.  Instead of carving component boundaries on concepts artfully abstracted from the problem domain,  engineers experiment with mathematics to unearth boundaries from the data directly.

And this is where machine learning’s complexity problem begins.  Training data sets rarely derive from a single cohesive set.  They instead depend on a number of  other data sets and algorithms.     Although the final training data set may be neatly organized as a large table of features and targets, the number of underlying data dependencies required to support this can be quite dramatic.

Traditional software engineering became really good at refactoring away dependencies in static code and system architectures in order to tame the complexity beast, the challenge now is to do the same for data dependencies in machine learning systems.

In conclusion, the paper “Machine Learning: The High Interest Credit Card of Technical Debt” summarized this and a number of other ML complexity challenges nicely:

No inputs are ever really independent. We refer to this here as the CACE principle: Changing Anything Changes Everything.”

Tagged ,

AI 3.0

AI, Machine Learning (ML) and Deep Learning (DL) are all the hype these days, and for good reason. By now we know progress in AI accelerated over the past decade thanks to a convergence of factors including Big Data and compute power. And results are impressive as a recent Economist article highlights:

In February 2015 DeepMind published a paper in Nature describing a reinforcement-learning system capable of learning to play 49 classic Atari video games, using just the on-screen pixels and the game score as inputs, with its output connected to a virtual controller. The system learned to play them all from scratch and achieved human-level performance or better in 29 of them.

Over the next two years, many businesses will continue ramping up their ML/DL initiatives with the hope of improving every aspect of their business performance. These companies will follow a path similar to Major League Baseball’s pursuit of sabermetrics, or Wall Street’s appetite for algorithmic trading.

I think at some point in 2018, the latest wave of AI hype will peak and begin receding thereafter. Ongoing issues with model accuracy, as well as high costs required to operate less-than-stellar model performance will be two of the primary reasons behind this. I also believe decision-makers will feel increasingly vulnerable as AI effectively detaches them from understanding and refining the theories underlying their business performance.

This will usher in a new period of enlightenment  where companies adjust their be-all-end-all expectations of AI in favor of empowering their people to effectively coexist with AI.  This will be good news for workers too as Tyler Cowen suggests in Average is Over:

As intelligent-analysis machines become more powerful and more commonplace, the most obvious and direct beneficiaries will be the humans who are adept at working with computers and with related devices for communications and information processing. If a laborer can augment the value of a major tech improvement by even a small bit, she will likely earn well. This imbalance in technological growth will have some surprising implications. The key questions will be: Are you good at working with intelligent machines or not? If the answer is yes, then your wage and labor market prospects are likely to be cheery. If the answer is no, but you have an unusual ability to spot, recruit, and direct those who work well with computers, then the contemporary world will make you rich.

A short lesson on data

You can do a lot of things on the Internet but whatever you do requires data.  The Internet has a lot of data.  Some say roughly 1,000,000,000,000,000,000,000,000 GB of data are available but no one really knows the exact amount.

GB is short for gigabytes, or one billion bytes.  We measure the size of data in bytes.  One byte is equivalent to eight bits.  You generally need between one and four of these bytes to represent a single letter in the alphabet, or twenty of them for the average English word.

Data needs to be stored and retrieved.  Hard drives were designed for exactly this purpose.  Twenty years ago it cost $259 to store one GB of data on a hard drive. Today it costs just a few pennies even if people prefer storing their data directly on the Internet.  This serves them well considering their phones and tablets don’t even have traditional hard drives.

Data needs to be uploaded and downloaded on the Internet.  And this requires a network connection that moves data to and from a computer and the Internet. Five years from now the average Internet user will be transfering 37GB a year through their internet connections.

Data can be stolen.  Before the Internet, a thief needed to be physically close to a computer in order to steal the data stored on its hard drive.  After computers started connecting to the Internet, thieves could now steel data from anywhere in the world.

You’re probably wondering why someone would steal another person’s data. A person steals another person’s data in order to hurt them.   Data is simply a recording of all the things people think and do in their lives.  Many times what a person thinks or does should remain private or should only be shared with a very small group of trusted people. This is our basic right but when a person steals our data they violate this right.   Sometimes the people we know and want to share our data with also violate our privacy when they accidentally make it available to someone else. 

Data also helps us make better decisions.   You are probably wondering “I make good decisions without needing data from the Internet,” and this is correct.  You rely on your judgement and intuition to make good decisions and this is how it should be.   With data however, you have a way of learning more about the facts that describe, explain or even predict a problem you are facing. When you take this data and apply some fancy math to it, you have a powerful new tool to help you answer tough questions.    And this is why the Internet is so powerful, it not only contains a lot of data (remember all the bytes I mentioned earlier), but it has the fancy math tools that help people make better decisions.

I know you are probably wondering, “If the Internet helps people make  decisions, can it also decide for itself?”  This is a great question, and makes for a great story another day. 

The lesson here is that data is very important in our lives and this will only increase as you grow older.  Learn to protect your data so it can only be seen by those people you trust.   Always rely on your judgement and intuition to make good decisions and learn how to use data to make better and more informed decisions. 

(for my two young daughters)

Tagged ,


Simple is that ‘horse that left the barn’ but remains in your line of sight.  Chase her down and the problem is solved.  Apply best practices in horse management to ensure it doesn’t happen again.

Complicated is trickier.  It’s that feeling of being ‘caught between a rock and a hard place.’  You’re aware of being unaware of how to get out. Nevertheless, you are confident that good survival practices will help navigate you out of this mess soon enough.

Complexity grows each second you ‘grab the bull by the horns.’  Your best bet is to try things, getting a sense for what works and repeat. If you succeed in taming the wild beast, remember to reflect on your experience, teasing out useful knowledge that will help you repeat this success in the future.


We hear these idioms everyday because we encounter these types of problems everyday. Understanding the category of problem we are solving is the first step towards effectively solving it.

The really interesting part is to get better at ordering complex problems, thereby diminishing their complexity, or increasing the order of complicated ones so they become simpler.



A recent New York Times article suggests that Rome is falling apart.  This may come as no surprise considering similar articles have suggested the same over the past decade.  It is nonetheless strange news for a city and country that are blessed with fourty-eight million tourist to its stunning countrysides, beautiful cities, and cultural treasures each year.

My wife and I moved to Rome in 2009.   I spent many summers in Italy as a child, but it is by living here that I realized the city is dazzling almost entirely through the preservation and promotion of its past success. This has the net effect of pushing aside the practical everyday needs of Romans.

In Thomas Friedman’s much talked about book, The World is Flat, a comparison is made to cities as collaborative platforms for social and economic progress.  As an IT professional, I can tell you that a technology platform’s value hinges on what it offers being fit-for-purpose and how it offers this being fit-for-use.  Rome is prioritizing the preservation of storied relics over the renewal of everyday services.  This makes the city a better fit for the purposes of its visitors than those of its residents.

Similarly, no resident here will resist the notion that Rome is increasingly unfit-for-use.  There are many complex reasons for this. A video that went viral last week may have a simple one.  In it, bus driver Christian Rosso attributes the recent chaos in the city’s public transportation system to the large quantity of city buses parked in the garage awaiting maintenance.  In other words, they are unfit for use and this has exhausted the patience of Rome’s visitors and residents alike.

My point here is not to fuel the nytimes article and its ensuing firestorm.  The fact of the matter is that Rome is one of the nicest cities you’ll ever visit.  However, If the city is to become more valuable to current and future generations of tourists and residents, the mayor and his team need to propose services that satisfy the changing needs of its 21st century residents.  They need to equally ensure these services work and can be relied upon throughout the year by residents and non-residents alike.

Value is an atomic all or nothing proposition.  Uncovering it requires the wisdom and leadership to understand purpose, as well as the knowledge and management to ensure its uninterrupted availability.

Tagged , ,

Social Skills

LinkedIn recently introduced a “whole new way to understand the landscape of skills & expertise, who has them, and how it’s changing over time.”   So essentially they have created a social network around knowledge worker skills.  Although the site confuses skill with technology (e.g. Wii, Blackberry and iPod as skills?), it nonetheless represents an innovative step towards better understanding skills and their relationship to the larger topic of competence (i.e. talent, skill, knowledge).

With LinkedIn Skills, I can now see the who, what, where and when of a particular skill, which is inline with the people-oriented features we’ve come to expect from other social media technologies such as Twitter, Facebook and Foursquare.  With Skills, I can track the growth of a particular skill, determine which skills are on the up and up, and which should be dropped in favor of greener pastures.   As the idea matures, I’m sure we’ll see commercial opportunities such as: Click to …”Verify, Improve, or  Share” your skill, but the precedent has been set.  Skills are now first class citizens in the world of social media technologies.


Live from Railsconf

Join me starting Monday, June 5, 2010 as I start a week of live blogging from Railsconf in Baltimore Maryland.

Click to see the previous day blogs:

Monday, June 7, 2010
Tuesday, June 8, 2010
Wednesday, June 9, 2010
Thursday, June 10, 2010

All Railsconf 2010 keynotes on youtube available here.

Tagged , ,

Challenges in Multi-Core Era – Part 3

Previously, I compared the performance of today’s popular operating systems with respect to multi-core processors.  In this final part to Challenges in Multi-Core Era, I’ll talk about the multi-core capabilities found in today’s programming languages and development tools.

The Programming Languages

When language architects were designing the foundations of the most popular programming languages, multi-core microprocessors were hidden in laboratories. Only high performance servers had access to multiprocessing systems. Just a few specialized workstations had more than one CPU installed. Therefore, C# and Java offered support for concurrency and multi-threading intended to offer more responsive applications. However, language architects didn’t design this support to optimize applications for future multi-core CPUs. Hence, nowadays, it is really difficult to optimize existing code to take advantage of multi-core CPUs using frameworks prepared for serial code.

In order to take full advantage of multi-core, the applications need a task-based design and a task-based programming. There is no silver bullet. So far, there is no way to optimize an application recompiling it without changes. Some developers expected this to happen. There is a great need for new designs, new programming techniques and new tools. The software development industry needs a new great paradigm shift.

Besides, developers need a framework capable of handling tasks. There are new programming languages, or new versions of older concepts, like functional programming. Functional programming makes it easier to code task-based designs and to split the work to be done in multiple independent tasks that could be run in parallel on multi-core CPUs.

There are many new programming languages with a great focus on functional programming, prepared to take full advantage of multi-core and to offer a great scalability. Just to mention a few:

  • Scala
  • Haskell. Yes, the one that has more than twenty years. Pure functional programming languages are back and they can be the future for parallel programming.
  • Microsoft Axum (formerly Maestro).
  • Microsoft F#.

However, do developers want to begin learning new programming languages? Most developers want to leverage their existing knowledge to move onto multi-core programming.

C++ and Fortran programmers had early access to parallel programming. Nowadays, these are the only programming languages that can take advantage of the full power offered by modern microprocessors. C++ is closer to the hardware. Hence, it allows many optimizations that aren’t available in any other programming language – apart from C and assembler. You can access all the vectorization capabilities from C++ and Fortran.

OpenMP has been offering a high quality multi-platform and open source shared-memory parallel programming API to C/C++ and Fortran for many years now. Besides, Intel Threading Building Blocks, also known as TBB, allows developers to express parallelism in C++ applications to take advantage of multi-core.

Message Passing Interface (MPI), is a language-independent communications protocol used to program parallel computers. You can use MPI to take advantage of multi-core on many programming languages. However, its main focus is to help develop applications to run on clusters and high performance computers.

Intel offers compilers and libraries optimized to take advantage of multi-core. However, you still require coding your applications considering new parallel designs. The usage of vectorization in their math libraries is a great plus point. There is an outstanding opportunity for new libraries and components optimized for multi-core and vectorization. Parallelism brings new opportunities to the software development industry.

There are new companies taking advantage of the need for multi-core optimizations, like Cilk Arts, offering its Cilk++ compiler. It is based on GCC and includes a modified compiler and debugger to simplify multi-core programming for Linux and Windows platforms.

Mac OS X’s Xcode development environment offers access to Grand Central Dispatch and OpenCL. OpenCL allows C programs to run in the GPU instead of loading the main CPU. The interest of developers in Xcode has really grown since multi-core and OpenCL.

C# and Java are evolving in order to offer developers new ways of expressing parallelism in their code. Indeed, they are changing many aspects that were designed for another world, the old single-core machines.  Some of these changes include new garbage collectors, new frameworks and features, new functional approaches and task-based programming capabilities.

Java 7 will offer the new fork-join framework, really optimized for multi-core.

C# 4.0 (Visual Studio 2010) will add task-based programming capabilities and parallelized LINQ (PLINQ). Besides, it will allow the possibility to manage the desired degrees of parallelism.

Furthermore, there are new DSLs (Domain Specific Languages) to add parallel programming capabilities to existing high-level languages. For example, GParallelizer adds nice parallel programming capabilities to Groovy.

Most modern programming languages are evolving or adding inter-operatibility capabilities with other languages to favor multi-core programming.

However, don’t forget about vectorization. Mono, a free and open source .Net compiler, offers access to SSE3 and SSE4 for C# developers.

A few years ago, concurrency was about threads. Now, experts are talking about tasks and fibers. Why? Because in order to develop an application using a task based approach, threads are too heavy. Tasks and fibers are lightweight concurrency elements, much lighter than threads. They allow developers to implement complex task-based designs without the complexities of threads.

Ruby 1.9 added fibers and it simplify the creation of pipelines. Pipelines take great advantage of Hyper-Threading combined with multi-core.

If you want to go parallel, follow James Reinders’ eight key rules for multi-core programming. You can apply them to any combination of programming language, framework, compiler and virtual machine.

Java 7 and .Net 4 will not offer framework support for vectorization (SIMD instructions). SIMD has been there since Pentium MMX microprocessor.  This decision doesn’t make sense especially considering that there is a huge market for smart professionals in the new parallelism age.

The New Tools

A well-known proverb says “A good workman is known by his tools”.

A parallelized application requires new debugging and testing techniques. You need to catch potential bugs introduced by concurrency.

Intel has been offering tools for High Performance Computing and parallelism for many years now. A few weeks ago, Intel launched one of the most complete parallel toolkits for C/C++ developers, Intel Parallel Studio. Among many other features, it helps developers to compile applications tuned for multi-core CPUs and to find concurrency specific bugs and bottlenecks. You should expect to see more tools like this coming on the next quarters.

Visual Studio 2010 will add enhanced multi-monitor support capabilities. You’ll need more than one monitor in order to debug applications running with a task-based approach. It will also add task-based debugging capabilities. However, Visual Studio 2010 has recently entered Beta 1. Therefore, if you want to develop an application using a task-based approach using C# 3.0, you still have to work with threads. Visual Studio 2008 offers nice multithreading debugging capabilities.

Most IDEs are changing to offer new task-based programming, debugging and testing capabilities. You have to test parallelized applications on multi-core CPUs. Many bugs aren’t going to appear when running them on single-core CPUs.

There are many free tools to help you in the multi-core jungle. You can monitor your applications and test their concurrency efficiency using Process Explorer and Intel Concurrency Checker.  If you use these tools to check commercial software, you’ll be able to see the need for new multi-core aware developers. Besides, you’ll see a lot of opportunities in the multi-core age.

By the way, multi-core programming has a high quality weekly talk show lead by Intel experts, Parallel Programming Talk.


While the old free lunch is over, the industry is reshaping itself to take advantage of the new microprocessors architectures of today and tomorrow.

Hardware will continue to advance and offer more parallel processing capabilities, even though the software industry is moving more slowly than expected.

Bottom line is that multi-core seems to be a really sustainable competitive advantage that requires a great paradigm shift from developers throughout the software lifecycle. There is light at the end of the tunnel. Are you ready to reach it?

About the author: Gaston Hillar has more than 15 years of experience in IT consulting, IT product development, IT management, embedded systems and computer electronics. He is actively researching about parallel programming, multiprocessor and multicore since 1997. He is the author of more than 40 books about computer science and electronics.

Gaston is currently focused on tackling the multicore revolution, researching about new technologies and working as an independent IT consultant, and a freelance author. He contributes with Dr Dobb’s Parallel Programming Portal, http://www.go-parallel and is a guest blogger at Intel Software Network

Gaston holds a Bachelor degree in Computer Science and an MBA.You can find him in and

Tagged , , ,

Challenges in Multi-Core Era – Part 2

Previously I talked about the evolution of microprocessors and specialized hardware since the wide-spread adoption of multi-core began a few years ago.  In this second part to Challenges in Multi-Core Era, I’ll compare the multi-core capabilities across today’s popular operating systems.

The Operating Systems

No matter the version, Mac OS always had a great advantage over any other desktop operating system. It knows exactly the underlying hardware because it is designed for running on Mac’s certified hardware only. You can run it on different hardware, at your own risk.  The same company develops the computers and the operating system. Leaving its great innovation aside, this is its great secret. For this reason, it can be tuned to take full advantage of specific hardware. For example, Mac OS X’s latest versions running over Intel microprocessors take advantage of vectorization, they use SSE (Streaming SIMD Extensions) and SSE2. In fact, Apple has been promoting vectorization and SIMD (Single Instruction Multiple Data) instructions in its Developer Connection Website.

However, Mac OS X Snow Leopard is giving another great step, offering Grand Central Dispatch.  Nonetheless, there is a big problem. There are too few developers working with specific Mac developer tools. Mac is also going to the 64-bits arena.

FreeBSD is one of the free and open source operating systems that always offered great features when working with multiprocessor systems. FreeBSD 7 works great with multi-core CPUs as well. Therefore, many high-performance servers around the world trust in FreeBSD’s scheduler.

The key is the operating system scheduler.  It is responsible for distributing the physical and logical processing cores, and assigning processing time to each concurrent thread.  It performs a really complex task.  For example, an operating system running over an Intel Core i7 CPU has eight logical processing cores (it can run eight hardware threads concurrently), but four physical cores. It has to distribute dozens of threads on time-slices available from eight logical cores. Thus, the scheduler efficiency impacts on the application’s performance. An inefficient scheduler could ruin a highly parallelized application’s performance.

Linux works great with multi-core CPUs, FreeBSD works better , but Linux does a great job. However, many desktop applications running on Linux GUIs aren’t optimized to take full advantage of multi-core. Nevertheless, Linux running as a Web server in a classic LAMP configuration is really fined tuned for multi-core.  Free Linux versions have a great advantage on the multi-core world as they don’t limit the number of cores using expensive licenses. Therefore, you can upgrade your desktop or server without needing to worry about the number of cores supported by your operating system license.

Both FreeBSD and Linux have a great advantage over Mac and Windows, most new deployments using these operating systems are using the 64-bits versions. Applications running in 64-bits offer more scalability than their 32-bits counterparts.  Parallel algorithms require more memory than serial ones. Most operating systems running in 32-bits can only address 4 GiB. There are some techniques to work with more memory even in 32-bits. However, they reduce the memory access performance.

4 GiB could seem a lot of memory. Nevertheless, as the memory map is a bit complex, you need some spaces for other purposes, like writing data to the video memory and other buffers. Hence, the operating system cannot access the whole 4 GiB of main memory. Furthermore, some operating systems limit the maximum memory addressable by an application in 32-bits mode (2 GiB in Windows standard configurations). Again, there are some techniques to work with more memory for the applications.

Whilst working with large images, videos and databases, these limits could be a great problem to add scalability. More cores mean more processes, more threads, more tasks or more fibers. A 2 GiB memory limit for a single process could mean a great I/O bottleneck.

Hence, working with 64-bits is a great step ahead. You have the same driver problems in both 32-bits and 64-bits Linux. Thus, you can install 64-bits Linux without worrying about additional problems. Working with 64-bits, you can scale as the number of cores increase without worrying about memory problems. Of course, you do have to worry about available physical memory. However, that’s another problem.

FreeBSD and Linux schedulers already offer an excellent support for NUMA. However, you may see nice improvements in future kernel versions. The idea is simple, the more the number of cores, the more optimizations required for an efficient scheduler.

Now, let’s move to Windows wonderland. Windows has a great disadvantage; each version offers different capabilities related to multi-core. You should check the maximum number of cores and the maximum memory supported by each Windows version. This doesn’t mean that Windows newest versions aren’t capable of handling multi-core CPUs. The problem is that they have different licenses and prices.

If you want scalability, you shouldn’t consider 32-bits versions anymore. Windows 2008 Server R2 and Windows 7 will support up to 256 logical processor cores. Nowadays, 256 logical cores seems a huge number. However, an Intel Core i7 is a quad-core CPU, but it offers eight logical cores (2 logical cores per physical core, 2 x 4 = 8). New CPUs with 8 physical cores are around the corner. Besides, they will offer 16 logical cores (2 logical cores per physical core, 2 x 8 = 16). Hence, you will see 16 graphs in your CPU activity monitor. An operating system offering support up to 256 logical cores really makes sense for the forthcoming years.

Despite being criticized everywhere, Windows Vista offered nice scheduler optimizations for multi-core. Windows 2008 Server R2 and Windows 7 will offer support for NUMA in their 64-bits versions. However, you must use the new functions to take full advantage of NUMA capabilities.

No matter the operating system, the applications must be prepared to take advantage of multi-core CPUs.
The operating systems are adding multi-core optimizations. However, except from Mac OS X, most applications running on the operating system take advantage of neither multi-core nor vectorization. I really cannot understand this.

I do believe it is time to learn from Mac OS X. For example, Windows 7 should offer a special version, let’s call it 7 Duo. 7 Duo could require at least a dual-core CPU with SSE2. So, you wouldn’t be able to run 7 Duo on older machines. If you have newer hardware, you’d buy and install 7 Duo. The operating system would load much faser, taking full advantage of your modern hardware. Your favorite Web browser should take advantage of multi-core and vectorization when parsing HTML or XML. Check this white paper, Parallelizing the Web Browser.

The great problem with PCs (x86 family) is the backward compatibility. It’s been an incredible advantage over the last decades. However, it is time to take advantage of modern hardware establishing baselines.
The same happens with Linux. I’d love to install an Ubuntu Duo on my notebook.

The operating systems are really crucial to tackle the multi-core revolution. Their schedulers are very important to transform multi-core power into application performance. Vectorization and SIMD is also important and most applications are not using it. It seems logical to develop new operating system versions designed to take full advantage of really modern hardware. They would add real value to your notebooks, desktop, workstations and servers.

Click here for part 3 where I compare the new capabilities of programming languages and development tools with respect to multi-core processors.

About the author: Gaston Hillar has more than 15 years of experience in IT consulting, IT product development, IT management, embedded systems and computer electronics. He is actively researching about parallel programming, multiprocessor and multicore since 1997. He is the author of more than 40 books about computer science and electronics.

Gaston is currently focused on tackling the multicore revolution, researching about new technologies and working as an independent IT consultant, and a freelance author. He contributes with Dr Dobb’s Parallel Programming Portal, http://www.go-parallel and is a guest blogger at Intel Software Network

Gaston holds a Bachelor degree in Computer Science and an MBA.You can find him in and

Tagged , , , ,

Challenges in Multi-Core Era – Part 1

A few years ago, in 2005, Herb Sutter published an article in Dr. Dobb’s Journal , “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”. He talked about the need to start developing software considering concurrency to fully exploit continuing exponential microprocessors throughput gains.

Here we are in year 2009 – more than four years after Sutter’s article publication. What’s going on? How are we doing? How did the industry evolve to tackle the multi-core revolution?

In this three part series, we’ll answer these questions by exploring the recent multi-core inspired evolution of components throughout the application stack, including microprocessors, operating systems and development platforms.

The New Microprocessors

Microprocessor manufacturers are adding processing cores. Most machines today have at least a dual-core CPU. However, quad-core CPUs are quite popular on servers and advanced workstations. More cores are round the corner.

There is a new free lunch. If you have an application designed to take advantage of multi-core and multiprocessor systems, you will be able to scale as the number of cores increase.

Some people say multi-core wasn’t useful. You can take a look as this simple video. It runs four applications (processes) at the same time on a quad-core CPU. Each application runs in a different physical processing core, as shown in the CPU usage history real-time graph (it uses one independent graph per processing core). Hence, it takes nearly the same time to run four applications than to run just one. Running just one application takes 6 seconds. Running four applications takes 7 seconds. What you see is what you get. There are no tricks. Multi-core offers more processing power. It is really easy to test this. However, most of the software wasn’t developed to take advantage of these parallel architectures in single applications.

There is another simple video showing one application running on a quad-core CPU. The first time, it runs using a classic, old-fashioned serial programming model. Hence, it just uses one of the four cores available, as shown in the CPU usage history real-time graph. Then, the same application runs in a parallelized version, taking less time to do the same job.

In recent years, parallel hardware became the mainstream standard in most developed countries. The great problem is that the speed of hardware evolution went much faster than the speed of software evolution, resulting in a large gap between the two. The microprocessors added new features that software developers didn’t exploit. Why did this happen? Because it was very complex to accomplish it. By the way, it’s still a complex task. I’ll get back to this later.

However, the most widespread model for multiprocessor support, SMP (Symmetric Multi-Processor) leaves the pole position to NUMA (Non-Uniform Memory Access). On the one hand, with SMP, the processor bus becomes a limitation to future scalability because each processor has equal access to memory and I/O. On the other hand, with NUMA, each processor gains access to the memory it is close to faster than to the memory that is farther away. NUMA offers better scalability when the number of processors is more than four.

With NUMA, computers have more than one system bus. A certain set of processors uses each available system bus. Hence, each set of processors can access its own memory and its own I/O channels. They are still capable of accessing the memory owned by the other sets of processors, with appropriate coordination schemes. However, it is obviously more expensive to access the memory owned by the other sets of processors (foreign NUMA nodes) than to work with the memory accessed by the local system bus (the NUMA node own memory).

Therefore, NUMA hardware requires different kinds of optimizations. The applications have to be aware of NUMA hardware and its configurations. Hence, they can run concurrent tasks and threads that have to access similar memory positions in the same NUMA node. The applications must avoid expensive memory accesses and they have to favor concurrency taking into account the memory needs.

A new free lunch offers manycore scalability. Expect more cores coming in the next months and years. Learn about the new microprocessors. Be aware of NUMA and optimize your applications for these new powerful architectures.

The New Specialized Hardware

On the one hand, we have a lot of software that is not taking full advantage of the available hardware power. On the other hand, there are many manufacturers developing additional hardware to offload processing from the main CPU. Does this make sense?

This means that you are wasting watts all the time because you’re using obsolete software. In order to solve this problem, you have to add additional, expensive hardware to free CPU cycles. But, you aren’t using entire cores.

TCP/IP Offload Engine (TOE) uses a more powerful NIC (Network Interface Card) or HBA (Host Bus Adapter) microprocessor to process TCP/IP over Ethernet in dedicated hardware. This technique eliminates the need to process TCP/IP via software running over the operating system and consuming cycles from the main CPU. It sounds really attractive, especially when working with 10 Gigabit ethernet and iSCSI.

CPUs are adding additional cores. So far, modern software is not taking full advantage of these additional cores. However, you still need new specialized hardware to handle the network I/O… Most drivers don’t even take advantage of old parallel processing capabilities based on SIMD (Single Instruction Multiple Data) offered since Pentium MMX arrival. TCP/IP offload Engine is a great idea. However, if I own a quad-core CPU with outstanding vectorization capabilities, SSE4.2 and previous versions, I’d love my TCP/IP stack to take advantage of it.

Vectorization based on SIMD allows a single CPU instruction to process multiple complex data at the same time. Thus, using them, it speeds up the execution time of complex algorithms many times. For example, an encryption algorithm requiring thousands of CPU cycles could perform the same results requiring less than a quarter of these CPU cycles using vectorization instructions.

Something pretty similar happens with games. Games are always asking for new GPUs. However, most games take advantage of neither multi-core nor vectorization capabilities offered by modern CPUs. I don’t want to buy new hardware because of software inefficiencies. Do you?

Modern GPUs (Graphics Processing Units) are really very powerful and they offer an outstanding processing power. There are many standards to allow software developers to use these GPUs as a CPU, like CUDA and OpenCL. They allow the possibility to run general purpose code on the GPU to release the main CPU from this load. It sounds really attractive. However, again, most software does not take full advantage of multi-core. It seems rather difficult to see commercial and mainstream software considering the possibilities offered by these modern and quite expensive GPUs. Most modern notebooks don’t offer these GPUs. Therefore, I see many limitations to this technique.

Before considering these great but limited capabilities, it seems logical to exploit the main CPU’s full processing capabilities. Most modern notebooks offer dual-core CPUs.

Specialized hardware is very interesting indeed. However, it isn’t available in every modern computer. It seems logical to develop software that takes full advantage of all the power and instruction sets offered by modern multi-core CPUs before adding more specialized and expensive hardware.

In part two of Challenges in Multi-Core Era, I’ll compare the multi-core capabilities of the latest operating systems.

About the author: Gaston Hillar has more than 15 years of experience in IT consulting, IT product development, IT management, embedded systems and computer electronics. He is actively researching about parallel programming, multiprocessor and multicore since 1997. He is the author of more than 40 books about computer science and electronics.

Gaston is currently focused on tackling the multicore revolution, researching about new technologies and working as an independent IT consultant, and a freelance author. He contributes with Dr Dobb’s Parallel Programming Portal, http://www.go-parallel and is a guest blogger at Intel Software Network

Gaston holds a Bachelor degree in Computer Science and an MBA.You can find him in and

Tagged , , ,