Verdict: Formerly known as Penryn - now known as awesome.
Its hard to think of something that Intel has done wrong this year; the company has released a massive selection of aggressively priced processors, ranging all the way from the bargaintastic Pentium E21-series to the ludicrously overclockable and energy-efficient G0-stepping Core 2 Quad Q6600. Meanwhile, AMD has struggled to get a single quad-core desktop CPU out the door, only managing to launch a few Barcelona architecture Opterons last month. However, despite the fact that Intel seems to be left to compete with itself, it's already shipping the first 45nm CPU this month - the Core 2 Extreme QX9650.
The benefits of smaller transistors are legion, but producing small transistors hasn't been easy for Intel. However, Intel hasn't only scaled down the transistors in the QX9650, it has also tweaked the architecture. This new architecture is known as Penryn, and will make its appearance not only in the QX9650, but also a whole range of mid-range Core 2 Duo and Xeon CPUs over the next few months. Penryn isn't an entirely new architecture, however - it's an evolution of the Core architecture.
The first major area of improvement is the radix divider, the part of the CPU that calculates subtractive algorithms. In a Core architecture CPU, the divider is known as radix-4 and can calculate two bits per iteration. The Penryn architecture introduces a new 68-bit CSA/CPA divider known as radix-16 that can calculate four bits per iteration.
As a result, programming loops with lots of subtractive-type algorithms such as ray tracing (a method of rendering super-realistic graphics) run twice as fast.
The Penryn architecture also introduces a new feature known as the Super Shuffle Engine, which improves the performance of SSE instructions without software having to be recompiled. Closely tied to the Super Shuffle Engine is the introduction of SSE4, which adds a further 47 instructions to the x86 instruction set. These are designed to perform certain tasks using just one operation rather than several, although all of the SSE4 instructions pertain to video encoding and photo editing. However, an application must be compiled with an SSE4-compliant compiler in order to reap the benefits - even current Intel application compilers don't support SSE4 yet, so application support is some way off.
Penryn architecture CPUs also have an improved cache architecture that now allows misaligned store results to be forwarded to a load. This should reduce cache latency - in a Penryn-based CPU, misaligned store commands can be carried out without having to wait for the whole cache to refresh. Not only that, but Penryn architecture CPUs also have more Level 2 cache than previous CPUs. Dual-core Penryn architecture CPUs have 6MB of shared Level 2 cache, as opposed to 4MB for top-end Core-based dies. Therefore, as a quad-core Penryn CPU still comprises two dual-core CPUs 'stapled' together, this Penryn quad-core CPU has a massive 12MB Level 2 cache rather than the 8MB previously seen.
Performance
The Core 2 Extreme QX9650 is clocked at 3GHz, the same clock speed as the previous Core 2 Extreme, the 65nm QX6850. Both CPUs also have a 1,333MHz FSB, and are rated at 130W TDP (Thermal Design Power, which is the amount of heat that a heatsink must dissipate). This makes it easy for us to compare just how much of an improvement the new architecture tweaks and extra cache provide over the previous Core architecture.
In our Media Benchmarks, the QX9650 is 6 per cent faster than the QX6850, although Supreme Commander ran at the same speed. Next, we tested both CPUs with CineBench R10, which is a ray-tracing benchmark based on the professional 3D modelling and animation package Cinema 4D. Again, the QX9650 was 6 per cent faster than the QX6850, and also 6 per cent faster in SuperPi. Given that none of these tests uses SSE4, the extra performance is due solely to the other architectural tweaks and enlarged Level 2 cache of the Penryn CPU.
The 45nm transistors in the Penryn-based CPU should also consume less power and create less waste heat. We used an Arctic Cooling Freezer 7 Pro HSF with a freshly applied layer of TIM on the CPUs, and swapped them in and out of an Asus P5K Premium WiFi-AP motherboard. Two instances of Orthos were used to fully load all four cores of whichever CPU we were testing (Orthos can only address two cores, so affinities had to be defined as well). We saw enormous temperature differences between the two CPUs: the QX6850 ran at a toasty 69ûC, while the QX9650 peaked at a much cooler 56ûC. We also measured the power draw of the PC at the mains with both CPUs running flat out. The difference of 48W is remarkable - with the QX9650 in place and fully stressed, the PC drew only 232W. Even if you aren't a lettuce-eating eco warrior, at least you can use a less powerful (and hopefully quieter) cooler with the QX9650.
This being Custom PC, it would be a crime not to overclock the QX9650. Our test QX6850 gives up the ghost at 3.76GHz using air cooling, so that was our starting point for the 45nm QX9650. However, even at 4GHz (with a vcore of 1.525V and multiplier set from 9 to 12), the QX9650 was hungry for more. By dropping the multiplier back to nine, but increasing the FSB from 333MHz to 466MHz (1,864MHz effective) and the vcore to 1.65V, the QX9650 was happy to run at 4.19GHz with air cooling, but refused to go any higher. At these settings, the video encoding score jumped from 1,671 all the way up to 2,357 (a 41 per cent increase), while SuperPi was completed in 36 per cent less time and CineBench R10 ran 32 per cent faster.
Conclusion
If the Penryn architecture only yielded an average performance increase of 6 per cent over the Core architecture, this alone would be considered a good step forward for Intel on this tricky new manufacturing process. However, the Penryn architecture has more to offer than just increased performance; the 45nm manufacturing process makes for lower power consumption and a reduction in the waste heat produced. You won't need a cooler as powerful or noisy as those needed with previous Core 2 Extreme CPUs. Taking our benchmarks as the most reliable metric, the performance per watt of the QX9650 is 28 per cent higher than that of the QX6850. This is great news for environmentalists and people with an aversion to huge electricity bills.
Of course, the smaller transistors of the QX9650 not only make it frugal with the power, but also very overclockable. It's strange that Intel didn't supply the chip at a higher standard frequency - especially considering that our CPU easily blasted through the 4GHz barrier on a £13 heatsink, and we've already seen reports of 5.5GHz QX9650s.
As a Core 2 Extreme CPU, it's very expensive and therefore only for extreme overclockers, or people who want a cooler, faster PC and will pay anything to get it. However, for those who can't afford to splash out more than £600 on a new CPU, the QX9650 marks the beginning of a range of cool-running, power-efficient processors. The future for Intel looks rosy, with 45nm dual-core and quad-core parts due out in the next few months, plus Penryn-based Xeons and mobile CPUs. If only we'd all bought Intel shares while the company was still peddling crappy NetBurst processors.