What do you do if you want your web site to run faster or otherwise add processing capacity using the same number of web hosting servers?

Multi-core CPUs have clearly been the answer offered by chip makers like Intel and AMD over the past couple of years. As operating systems and desktop software continue to bloat, multi-core CPUs offer a more cost-effective and power-efficient alternative to traditional multi-CPU configurations. Intel is now shipping core 2 quad-core processors and Sun has been marketing four- and eight-core "coolthreads" sparc-based systems for a year or more touting their reduced power and enhanced performance capabilities versus similarly equipped multi-cpu solutions from other vendors.

The problem with traditional single-core CPUs is that the only way to get increases in performance from the same chip designs (other than scaling through costly n-way motherboard architectures) was to increase the clock frequency. CPU power consumption generally increases exponentially with an increase in CPU frequency.

AMD CPU power consumption vs. clock frequency

This tends to occur because, as clock frequency increases, more voltage must be applied to get transistors to switch on and off at higher speeds (most overclockers have in-practice experience with this). As voltage used to drive the on-off transitions of transistors in the CPU increases, transistors experience increased "leakage current", or current that takes alternate paths through the transistor, resulting in increased heat, current comsumption, and overall power dissipation. It's a lot more efficient (in terms of power consumption) to run a CPU at a slightly lower clock frequency (say n*.8 where n is the max supported clock frequency) than at the maximum frequency at which a CPU (or core) can operate.

The answer? Mutiple cores per CPU running at slower speeds. This approach allows almost linear scalability in small form factors with substantially lower power per CPU cycle.

Intel and other companies like ClearSpeed endeavor to take this multi-core approach to unheard-of levels. Intel has recently demonstrated a prototype 80-core CPU capable of processing capacity in excess of one teraflop (1 trillion floating-point operations per second). That's more processing on one multi-core CPU than was delivered in 1984's fastest, room-sized supercomputer (The Cray X-MP). Scaling performance in a linear fashion tends to rely on application software being written to take advantage of multiple threads of execution, but work is being done to allow operating systems and even the CPUs themselves to intelligently divide work on behalf of application software.