
|
|||||||||||||
|
|
||||||||||||

Verdict: ATI finally unveils the Radeon HD 4850. It's the best Radeon since the 9800 Pro - cheaper than its Nvidia rivals, and faster in many games.
While Nvidia has been busy making the most stupendously large GPU ever made, the GeForce GTX 280, ATI haas been beavering away at making the RV770 chip at the heart of the HD 4850 and HD 4870 as efficient at running games as possible. We're told that the engineers designed a super-advanced GPU modelling suite to test architectural revisions and tweaks for fully six months before finally committing a design to silicon.
The GPU modelling suite allowed the ATI engineers to watch how data flowed through each revision of the GPU clock by clock, and to experiment with different layout and arrangements of the various sub-units that comprise a modern GPU.
Before delving into all the clever architectural tweaks, let's first see what's inside the new chip.
|
|
Radeon HD 4870 |
Radeon HD 4850 |
GeForce GTX 260 |
GeForce 8800 GT |
|
GPU |
RV770 |
RV770 |
GTX 260 |
G92 |
|
DirectX support |
Direct3D 10.1 |
Direct3D 10.1 |
Direct3D 10 |
Direct3D 10 |
|
Core speed |
750MHz |
625MHz |
576MHz |
600MHz |
|
Stream processors |
800 |
800 |
192 |
112 |
|
Stream processor speed |
750MHz |
625MHz |
1.242GHz |
1.5GHz |
|
Memory |
512MB GDDR5 |
512MB GDDR3 |
896MB GDDR3 |
512MB GDDR3 |
|
Memory speed |
1.8GHz (3.6GHz effective) |
1GHz (2GHz effective) |
999MHz (1.998GHz effective) |
900MHz (1.8GHz effective) |
|
Memory interface |
256-bit |
256-bit |
448-bit |
256-bit |
|
Memory bandwidth |
115GB/sec |
64GB/sec |
112GB/sec |
57.6GB/sec |
|
ROPs |
16 |
16 |
28 |
16 |
|
Texture units |
40 |
40 |
80 |
56 |
|
Tessellation unit |
Yes |
Yes |
No |
No |
|
Power inputs |
2 x 6-pin PCI-E |
1 x 6-pin PCI-E |
2 x 6-pin PCI-E |
1 x 6-pin PCI-E |
|
Maximum card power draw |
160W |
110W |
182W |
105W |
The difference between a HD 4850 and a HD 4870 are few but significant. Both GPUs are indeed based on the same actual chip (codenamed RV770) but are clocked at different frequencies. The HD 4870 also uses GDDR5 memory, which allows for much higher memory frequencies and thus greater memory bandwidth while still using a 256-bit memory interface. We'll talk more about why GDDR5 is more overclockable, and why the adoption of GDDR5 by major memory manufacturers could leave Nvidia in a pickle, later.
A Sapphire HD 4850 costs £122 inc VAT, while pricing for Radeon HD 4870 still hasn't been fully confirmed. ATI gives an SRP (suggested retail price) of $299, and seeing as the SRP of the HD 4850 is $199, that should translate very well into Pounds, Sterling – we hope for roughly £150 inc VAT. Expect a round-up of prices as soon as the major retailers activate their pricing pages.
Considering those specs – 800 stream processors, 512MB of video RAM as standard and all the rest – both the HD 4850 and HD 4870 seem very keenly priced. As we've previously said, ATI has worked hard on getting the most performance out of each transistor in the HD 4850 and HD 4870. Nowhere is this more evident than in the all-new texture units of the HD 4800-series.
EFFICIENT TEXTURE UNITS
The HD 3850 and 3870 GPUs were based on the RV670 design, and had six stream processor clusters of 80 stream processors each. The new RV770 design ups the number of stream processor clusters to ten, giving that eye-opening count of 800 stream processors. This rise in the number of clusters has lead to a big re-think with how work is dished out to them.
The ATI design has two layers of texture cache for the stream processors to address, with the stream processors only fetching texture data from graphics memory if that data wasn’t found in the Level 1 or Level 2 texture cache first. As fetching texture data is uses a lot of memory bandwidth, RV670 had a redundant cache architecture, where each cache held identical data. This was fine, as with only six clusters working on six small eight-pixel by eight-pixel chunks of the frame being rendered, the chances were they’d all actually need the same texture data.
The same texture cache architecture just wouldn’t work well with the increased number of clusters in RV770 though, so AMD did some rather clever tweaking. All the texture cache units are now fully autonomous, and the Level 2 cache units are now actually worthy of the name rather than being passive data stores as in RV670.
Even more impressively, the Level 2 texture cache units can broadcast their content to all the Level 1 units. So, if a cluster needs to look up some texture data it first goes the Level 1 texture cache, and if it can’t find the data, it’ll have all ten Level 2 texture caches rushing to get it the data. Only if none on the Level 2 texture caches has the required data will the cluster have to use the memory interface to fetch data from video memory.
With this autonomous texture cache approach ATI has attempted to minimize texture cache misses and also texture overfetch, and thus reduce the strain on the memory interface. And as the internal bandwidth between texture units is very high (up to 384GB/sec between the Level 1 and Level 2 units), there’s less time spent waiting for texture data that’s already been fetched than if the cluster had to go to video memory anyway.
The efficiency of the texture fetch and texture cache system means that ATI claims that the HD 4870 has a far greater texture fill rate than even the Nvidia GeForce GTX 280 – as much as 781Gtexels/sec compared to 672Gtexels/sec. This despite the Nvidia GPU having 80 texture units running at 1.296GHz to the 40 of the HD 4870 running at 750MHz. Quite an incredible claim.
ATI has also rearranged the layout of the memory controller (or, more accurately, memory controllers) in the HD 4850 and 4870. Rather than one massive memory controller, GPU memory controllers are comprised of small units placed around the die so as to be a near as possible to the memory chips on a graphics card. This leads to a certain flexibility when arranging units around these memory controllers, which ATI seems to have taken advantage of.
The highest memory bandwidth hogs are the ROPs and the Level 2 texture caches, so ATI has physically placed these units next to the memory controllers and used a thousand or so traces to connect them. This gives these bandwidth-intensive units great access to the video memory, while the various memory controllers are all also connected to a memory hub to handle the ‘relatively low-bandwidth traffic such as the PCI-E interconnect, CrossFire interconnect and so forth.’
GDDR5
While we’re talking about memory controllers, let’s talk about the memory they’ll be connected to. HD 4850 cards will have GDDR3 memory, while Radeon HD 4870 cards will have GDDR5 memory. The HD 4870 can afford to stick with a 256-bit memory interface because of the higher frequencies of GDDR5 – rather than build a large, complicated and expensive 512-bit memory controller to get a high amount of memory bandwidth, ATI has chosen to up the frequency at which the 256-bit interface of the HD 4870 operates.
The 1GHz (2GHz effective) GDDR3 of the HD 4850 and its 256-bit memory interface gives it a fair 64GB/sec of memory bandwidth, the 1.8GHz (3.6GHz effective) GDDR5 of the HD 4870 on the same 256-bit interface gives it a far greater 115GB/sec of memory bandwidth.
GDDR5 has a handful of advantages over GDDR3 and GDDR4. First and foremost is increased frequency, which is due to per-bit de-skew, on-the-fly error correction and greater signal integrity. This all means that GDDR5 should be very overclockable. It also consumes less power than GDDR3.
So why is GDDR5 a potential problem for Nvidia? Well, the spec for GDDR5 was pushed through the JEDEC memory standardisation board by ATI. We stress that GDDR5 is an open standard – in fact three memory manufacturers are already making GDDR5 chips – but it could mean the Nvidia might have to redesign its memory controller to support GDDR5. The new chips might be pin-compatible, but there’s no guarantee that Nvidia can switch to GDDR5 on a whimsy.
AA and ROPs
ATI says it has totally rebuilt the ROPs and the back-end of this GPU, finally (if tacitly) admitting that having hardware that only does AA as shader code (as with the RV600 GPU of the Radeon HD 2000-series) isn’t a good idea at the moment. The HD 3000-series had ROPs that could do AA, but ATI clearly thought it could do better.
The new ROPs have advanced AA abilities – a doubling of peak rate depth and stencil operations per clock to 64 for example. While a HD 3000-series GPU could only output eight pixels/clock when using 2x or 4x AA with 32-bit colour, a HD 4000-series GPU can output 16. As the graphics card can output 16 pixels/clock when you’re using no AA, that should mean that you should be able to use 4x AA without much of a frame rate hit. A HD 4000-series GPU can output eight pixels per clock when using 8x AA rather than just four.
ATI is also keen to push custom filter AA (CFAA), which it says is improved with the HD 4000-series. CFAA aims to detect the edges of shapes in a frame, and smoothes these edges. This, ATI says, is a far better way to do AA than the standard sample patterns of AA which tend to have roughly square scatter patterns. It’s the difference between smudging a line of a charcoal sketch along the line to smoothen it, rather than smudging individual points along the line.
Only thorough testing and benchmarks will tell though, and that’ll have to wait for another day.
PERFORMANCE
You’ve waited long enough for the benchmark results (opens in a new window), so let’s no tease you any longer. First the bad news – we didn’t have a HD 4870 to test, only the lesser HD 4850.
However, it’s incredible that we’re even considering testing a £122 card at the native resolution of a 30in TFT (2,560 x 1,600) – that the HD 4850 comes very close to running a few of our test games at playable frame rates at this resolution is stunning. We used a Leadtek PX8800GT ZL card to compare against the HD 4850, which is based on an overclocked GeForce 8800 GT GPU with 512MB of RAM. This costs about £140 but should provide some stiff competition for the £122 HD 4850.
Race Driver: Grid is our favourite game of the moment, so we kicked off with that. Both cards could handle the game at 1,680 x 1,050 with 2x AA with ease, but the frame rates show the HD 4850 to be far superior with a minimum of 60fps to the 44fps of the 8800 GT card. This is doubly impressive as we were testing with a BETA version of Catalyst 8.6 while the ForceWare 175.19 driver of the GeForce cards is very mature.
At 1,920 x 1,200 with 4x AA both cards again gave playable frame rates – usually the native resolution of a 24in TFT is the sole domain of £300+ graphics cards, but modern mid-range cards are extremely powerful. The HD 4850 again edged the overclocked 8800 GT to the win with a minimum frame rate of 41fps versus 33fps. Even at 2,560 x 1,600 with 4x AA (the native resolution of 30in TFT) the HD 4850 struggled manfully to run the game – entire sections were smooth and playable, though smoke and large areas of new terrain proved a problem. The 8800 GT didn’t put up much of a fight at this resolution.
Eager to see just how powerful the new card was, we launched Crysis. The performance between the two cards was close, but the 8800 GT edged ahead. At 1,680 x 1,050 with 2x AA we measured a 19fps minimum frame rate from the HD 4850 and only 17fps minimum from the GeForce card, but at 1,920 x 1,200 with 4x AA we saw a minimum frame rate of 14fps from the 8800 GT card versus just 8fps from the HD 4850.
Call of Duty 4 was a better game for the HD 4850, with higher frame rates than the 8800 GT at every resolution. Neither card could play the game smoothly at 2,560 x 1,600, but the 8800 GT dipped below the 25fps minimum ‘unplayable’ cut-off at 1,920 x 1,200. The HD 4850 played CoD4 at 1,920 x 1,200 with 4x AA at 30fps minimum.
Company of Heroes: Opposing Fronts was our last test game, and gave interesting and frustrating results. In DirectX 9 mode, the overclocked GeForce 8800 GT had the lead. While the HD 4850 stuttered along with a minimum frame rate of 21fps, when you ramp the resolution up to 2,560 x 1,600, the 8800 GT can still run the game smoothly with a minimum of 33fps. The 8800 GT is also a touch faster at every other resolution.
However, when we switched to DirectX 10 mode, the game refuses to launch on the 8800 GT card at all. Even a total uninstall and re-install of the game didn’t give us any joy. We can’t say for sure whether it’s the ForceWare driver or the game (or possibly our test rig) that’s at fault, but we couldn’t produce any scores. We could benchmark the HD 4850, but the scores were terrible. The game was extremely stuttery, as the minimum frame rates of 1fps at every resolution indicates.
We played with 3DMark06 and Vantage, and while the former prefers the 8800 GT card, Vantage clearly runs better on the HD 4850. This is an indication that the HD 4850has more longeivity than the 8800 GT – it handles the advanced graphical effects of Vantage better – but it’s only an indication.
Finally, we ran some power consumption tests, which show that the HD 4850 is more power-hungry than the 8800 GT, but not by a huge amount. We should also point out that the HD 4850 card, with its reference cooler, gets extremely hot. We swapped it out immediately after the last Vantage test and the copper heatsink was scorching. The card never revved its fan particularly hard or noisily, and we didn’t encounter any stability issues while testing, so clearly this card just likes running hot.
CONCLUSION
With some games a touch faster on the 8800 GT and some noticeably faster on the HD 4850, it’s not quite clear-cut as to which is better. The fact that the choice is so narrow is a nice problem for ATI as we tested it against a card with very mature drivers – perhaps a few revisions of Catalyst can unlock some more speed as ATI finds tweaks and optimisations. 3DMark Vantage too shows that the HD 4850 could have more latent power inside it that will keep it up to date with future games releases while 8800 GT cards are being retired.
If you’ve already got a 8800 GT or a comparable (or more) powerful card, then there’s no need to swap it in favour of a HD 4850. And by the time you do – in six months or a year – there’ll be a new graphics card to choose anyway. If your current card is just plain rubbish though, the HD 4850 should prove a better buy than a GeForce 8800 GT at the moment. Just beware that a £150-or-so GeForce 9800 GTX+ arrived in the Labs today and could prove an even better purchase – expect a review online in the next few days.
Test kit: 3GHz Intel Core 2 Extreme QX6850 overclocked to 3.33GHz, Asus Maximus Extreme motherboard, 2GB of Patriot PC3-1500 DDR3 RAM, Windows Vista Ultimate 32-bit, ATI Radeon HD 4850: Catalyst 8.6 BETA, GeForce 8800 GT: ForceWare 175.19
User Reviews
DX10.1 for a (big) tank of petrol anyone ?
"Go back a couple of years and the cards you you could buy for less than £150, always involved a serious compromise - but not any more."
Whether you are an ATI or nVidia fanbois - this card is still fantastic. For the red-team, it brings awesome frame rates into the 'anyone can afford em' range. For the green-team-loyalists, it has meant a serious dropping of pants on price for quality kit like the 9600GT.
Review by: Resurrection
Average User Rating:
95%
Fastest, cheapest 3G mobile broadband dongles from 3, Vodafone, T-Mobile and Orange
from just £10/month