Previously I talked about the evolution of microprocessors and specialized hardware since the wide-spread adoption of multi-core began a few years ago. In this second part to Challenges in Multi-Core Era, I’ll compare the multi-core capabilities across today’s popular operating systems.
The Operating Systems
No matter the version, Mac OS always had a great advantage over any other desktop operating system. It knows exactly the underlying hardware because it is designed for running on Mac’s certified hardware only. You can run it on different hardware, at your own risk. The same company develops the computers and the operating system. Leaving its great innovation aside, this is its great secret. For this reason, it can be tuned to take full advantage of specific hardware. For example, Mac OS X’s latest versions running over Intel microprocessors take advantage of vectorization, they use SSE (Streaming SIMD Extensions) and SSE2. In fact, Apple has been promoting vectorization and SIMD (Single Instruction Multiple Data) instructions in its Developer Connection Website.
However, Mac OS X Snow Leopard is giving another great step, offering Grand Central Dispatch. Nonetheless, there is a big problem. There are too few developers working with specific Mac developer tools. Mac is also going to the 64-bits arena.
FreeBSD is one of the free and open source operating systems that always offered great features when working with multiprocessor systems. FreeBSD 7 works great with multi-core CPUs as well. Therefore, many high-performance servers around the world trust in FreeBSD’s scheduler.
The key is the operating system scheduler. It is responsible for distributing the physical and logical processing cores, and assigning processing time to each concurrent thread. It performs a really complex task. For example, an operating system running over an Intel Core i7 CPU has eight logical processing cores (it can run eight hardware threads concurrently), but four physical cores. It has to distribute dozens of threads on time-slices available from eight logical cores. Thus, the scheduler efficiency impacts on the application’s performance. An inefficient scheduler could ruin a highly parallelized application’s performance.
Linux works great with multi-core CPUs, FreeBSD works better , but Linux does a great job. However, many desktop applications running on Linux GUIs aren’t optimized to take full advantage of multi-core. Nevertheless, Linux running as a Web server in a classic LAMP configuration is really fined tuned for multi-core. Free Linux versions have a great advantage on the multi-core world as they don’t limit the number of cores using expensive licenses. Therefore, you can upgrade your desktop or server without needing to worry about the number of cores supported by your operating system license.
Both FreeBSD and Linux have a great advantage over Mac and Windows, most new deployments using these operating systems are using the 64-bits versions. Applications running in 64-bits offer more scalability than their 32-bits counterparts. Parallel algorithms require more memory than serial ones. Most operating systems running in 32-bits can only address 4 GiB. There are some techniques to work with more memory even in 32-bits. However, they reduce the memory access performance.
4 GiB could seem a lot of memory. Nevertheless, as the memory map is a bit complex, you need some spaces for other purposes, like writing data to the video memory and other buffers. Hence, the operating system cannot access the whole 4 GiB of main memory. Furthermore, some operating systems limit the maximum memory addressable by an application in 32-bits mode (2 GiB in Windows standard configurations). Again, there are some techniques to work with more memory for the applications.
Whilst working with large images, videos and databases, these limits could be a great problem to add scalability. More cores mean more processes, more threads, more tasks or more fibers. A 2 GiB memory limit for a single process could mean a great I/O bottleneck.
Hence, working with 64-bits is a great step ahead. You have the same driver problems in both 32-bits and 64-bits Linux. Thus, you can install 64-bits Linux without worrying about additional problems. Working with 64-bits, you can scale as the number of cores increase without worrying about memory problems. Of course, you do have to worry about available physical memory. However, that’s another problem.
FreeBSD and Linux schedulers already offer an excellent support for NUMA. However, you may see nice improvements in future kernel versions. The idea is simple, the more the number of cores, the more optimizations required for an efficient scheduler.
Now, let’s move to Windows wonderland. Windows has a great disadvantage; each version offers different capabilities related to multi-core. You should check the maximum number of cores and the maximum memory supported by each Windows version. This doesn’t mean that Windows newest versions aren’t capable of handling multi-core CPUs. The problem is that they have different licenses and prices.
If you want scalability, you shouldn’t consider 32-bits versions anymore. Windows 2008 Server R2 and Windows 7 will support up to 256 logical processor cores. Nowadays, 256 logical cores seems a huge number. However, an Intel Core i7 is a quad-core CPU, but it offers eight logical cores (2 logical cores per physical core, 2 x 4 = 8). New CPUs with 8 physical cores are around the corner. Besides, they will offer 16 logical cores (2 logical cores per physical core, 2 x 8 = 16). Hence, you will see 16 graphs in your CPU activity monitor. An operating system offering support up to 256 logical cores really makes sense for the forthcoming years.
Despite being criticized everywhere, Windows Vista offered nice scheduler optimizations for multi-core. Windows 2008 Server R2 and Windows 7 will offer support for NUMA in their 64-bits versions. However, you must use the new functions to take full advantage of NUMA capabilities.
No matter the operating system, the applications must be prepared to take advantage of multi-core CPUs.
The operating systems are adding multi-core optimizations. However, except from Mac OS X, most applications running on the operating system take advantage of neither multi-core nor vectorization. I really cannot understand this.
I do believe it is time to learn from Mac OS X. For example, Windows 7 should offer a special version, let’s call it 7 Duo. 7 Duo could require at least a dual-core CPU with SSE2. So, you wouldn’t be able to run 7 Duo on older machines. If you have newer hardware, you’d buy and install 7 Duo. The operating system would load much faser, taking full advantage of your modern hardware. Your favorite Web browser should take advantage of multi-core and vectorization when parsing HTML or XML. Check this white paper, Parallelizing the Web Browser.
The great problem with PCs (x86 family) is the backward compatibility. It’s been an incredible advantage over the last decades. However, it is time to take advantage of modern hardware establishing baselines.
The same happens with Linux. I’d love to install an Ubuntu Duo on my notebook.
The operating systems are really crucial to tackle the multi-core revolution. Their schedulers are very important to transform multi-core power into application performance. Vectorization and SIMD is also important and most applications are not using it. It seems logical to develop new operating system versions designed to take full advantage of really modern hardware. They would add real value to your notebooks, desktop, workstations and servers.
Click here for part 3 where I compare the new capabilities of programming languages and development tools with respect to multi-core processors.
About the author: Gaston Hillar has more than 15 years of experience in IT consulting, IT product development, IT management, embedded systems and computer electronics. He is actively researching about parallel programming, multiprocessor and multicore since 1997. He is the author of more than 40 books about computer science and electronics.
Gaston is currently focused on tackling the multicore revolution, researching about new technologies and working as an independent IT consultant, and a freelance author. He contributes with Dr Dobb’s Parallel Programming Portal, http://www.go-parallel and is a guest blogger at Intel Software Network http://software.intel.com.