Looking back at two years of Graphics Core Next
ATI, later AMD endured several major evolutions in its graphics architecture which mirrored industry developments and major revolutions in the PC and graphics industry. At times Radeon graphics was used by Microsoft for the development of their DirectX APU. All roads lead to GCN, which encompasses AMD's vision for a graphics core that can scale top to bottom, for high end gaming graphics cards, consoles and tablets.
The design goals for the original GCN at launch sound remarkably familiar to those for R9, with the exception being heterogeneous computing which is still somewhat undeveloped and Fusion which is now rebranded as HSA.
Tahiti resembles a 'stack of Lego bricks' which, Compared to Hawaii
Hawaii's GCN '2.0's hardware resources are organised into is organised into four units called "Shader Engines" which allows resources to be scaled and shared more effectively. This effectively mirror's NVIDIAs approach with Kepler expect their name for the topology is SMX units.
This allows for GPUs to be scaled down more easily by disabling an entire Shader engine (or SMX units) without re-spinning the entire chip to reduce the number of cores or fusing off clusters of cores. There is still some resource sharing within resources contained in each shader engine such as renderers and caches.
Each Shader Engine contains1 rasteriser and 1 Geometry Unit which can load balance, 1 Shader Engine is sufficient to operate the entire GPU.
This diagram shows a simplified overview of the graphics pipeline
- The Graphics command processor overseas operations across load balanced resources
- Geometry is setup and tessellated in The Geometry processors. Data can be exchanged with the compute units if needed or sent to Rasteriser directly.
- The compute units execute pixel shaders or perform GPU computing on the scene
- Pixel data is then passed onto the rasterisers which handle assignment or partitioning of pixels on the screen as well as Hierarchal Z sorting, i.e. the pixels depth in the scene
- Finally, the Render Back Ends handle Pixel Depth Testing as well as stencilling and colour operations
Further to do actual processing, Each of Hawaii's Shader Engines contain 11 Compute Units. The Compute Unit is the smallest physical processing block of the GPU containing all of the necessary low level building blocks that a compute processor needs to fetch, decode and execute instructions.
The final stage are the Render Back Ends which handle operations relating to the scene's Z(Depth), Stenciling and Color.
That is all the graphics and compute processing pipelines explained, but a GPU many processors in parallel, which need to be fed tasks and be directed.
We need a means of scheduling and dispatching to allow the GPU to perform multi-tasking across its parallel computing units. This is where the Asynchronous Compute Units come in, which Hawaii has 8 of which are independent of the Shader engines. The ACE units queue, store and share data for use in GPU computing across the entire GPU. Graphics specific commands are issued by a separate command unit.
So in summary the layout of Graphics Core Next Architecture, 'version 2' as used in the 290X is essentially a scaled up version of Tahiti.
GCN '2.0' supports:
- 1-8 Asynchronous Compute Engines
- 1-4 Shader Engines
- 1-11 Compute Unit per Shader Engine, giving 64 to 704 shaders per Shader Engine
Tahiti v Hawaii – spec comparison
AMD Graphics Core Next |
AMD Graphics Core Next |
Increase | |
---|---|---|---|
Compute Units/ IEEE-2008 Compliant Shaders |
32 / 2048 |
44 / 2816 |
1.4x |
Geometry Processors |
2 |
4 |
2.0x |
Render Back-Ends |
8 |
16 |
2.0x |
L2 Cache |
768KB |
1MB |
1.3x |
Memory Bus |
384 bit wide GDDR5 264 GB/sec |
512-bit wide GDDR5 320GB/s |
1.2x |
In addition to the increased GPU resources, Hawaii adds updating display controllers for Eyefinity, AMD TrueAudio and a new version of CrossFire.
GCN v Kepler Architecture Performance & Efficiency – spec comparison
AMD Radeon |
AMD Radeon |
Increase |
NVIDIA GeForce GTX 780 |
NVIDIA GeForce GTX TITAN ‘Kepler’ | |
---|---|---|---|---|---|
Geometry Processing |
2.1 billion primitives/sec |
4 billion primitives/sec |
1.9x |
||
Compute |
4.3 TFLOPS |
5.6 TFLOPS |
1.3x |
4.0 TFLOPS |
4.5 TFLOPS |
Texture Fill Rate |
134.4 Gtexels/sec |
176 Gtexels/sec |
1.3x |
166 Gtexels/sec |
188 Gtexels/sec |
Pixel Fill Rate |
33.6 Gpixels/sec |
64 Gpixels/sec |
1.9x |
41.4 Gpixels/sec |
40.2 Gpixels/sec |
Peak Bandwidth |
264 GB/sec |
320GB/sec |
1.2x |
288 GB/sec |
288 GB/sec |
Die Area |
352 mm^2 |
438 mm^2 |
1.24x |
561 mm^2 |
561 mm^2 |
Peak GFLOPS/mm^2 |
12.2 |
12.8 |
1.05x |
7.1 |
8 |
While Peak raw power and computing have not significantly increased, the GPU’s horsepower within its engines is much stronger with almost 2x throughput available for 3D Graphics intensive tasks such as pixel shaders and geometry at only a 25% increase in die size. On paper 290X is more efficient, thanks to its ‘higher horsepower’ design at a smaller die size than the Kepler GK110 based GTX 780.
On paper, 290X provides a good step-up from the previous generation HD 7970. The lower compute performance for NVIDIA GeForce is expected as this is a hallmark of their consumer oriented GPU.