As the PC graphics industry continues down the path of low-overhead graphics APIs, today I wanted to bring you some new details on two significant features of DirectX® 12. These features are called “multi-threaded command buffer recording” and “async shaders,” and they are poised to make a significant difference for gamers everywhere. Let’s take a look at what they do and why they matter.
This feature allows a game engine to execute GPU compute or memory activities during “gaps” in the graphics workload presented by a game.
While it seems sensible to allow the graphics, compute and memory functions of a GPU to operate simultaneously, past versions of DirectX® did not provide for this functionality. Past versions of DirectX® were essentially limited to a single, serial graphics queue for processing all types of workloads. Therefore graphics, compute and memory copy operations had to wait for other parts of the graphics queue to finish processing before springing to life and doing their work. This would often result in idle hardware for some portions of time, and idle hardware is squandered performance.
In contrast, DirectX® 12 Async Shaders supercharge work completion in a compatible AMD Radeon™ GPU by interleaving these tasks across multiple threads to shorten overall render time. Async Shaders are materially important to a PC gamer’s experience because shorter rendering times reduce graphics pipeline latency, and lower latency equals greater performance. “Performance” can mean higher framerates in gameplay and better responsiveness in VR environments. Further, finer levels of granularity in breaking up the workload can yield even greater reductions in work time. As they say: work smarter, not harder.
Finally, it must be understood that AMD’s Graphics Core Next architecture is specifically equipped to enable incredibly fine DirectX® 12 Async Shader granularity with dedicated hardware known as the Asynchronous Compute Engine (ACE). Many ACEs serve as fundamental building blocks in modern AMD graphics hardware, and they are specifically tuned to accommodate significant parallelization of complex jobs with superb performance.
This diagram of the AMD Radeon™ R9 290X GPU’s architecture shows eight Asynchronous Compute Engines (ACEs) ready to handle Async Shader work. Each AMD product based on GCN has a certain amount of these ACEs.
MULTI-THREADED COMMAND BUFFER RECORDING
The command buffer is a game’s “to-do list,” a list of things that the CPU must reorganize and present to an AMD Radeon™ graphics card so that graphics work can be done. Things on this to-do list might include lighting, placing characters, loading textures, generating reflections and more.
Modern PCs often ship with multi-core CPUs like AMD FX processors or AMD A-Series APUs. One notable characteristic of DirectX® 11-based applications is that many of these CPU cores in any multi-core CPU go partially or fully unutilized. This lack of utilization is owed to DirectX® 11’s relative inability to break a game’s command buffer into small, parallel and computationally quick chunks that can be spread across many cores. In addition to modest multi-threading in DirectX® 11, a disproportionate amount of CPU time is frequently spent on driver and API code (“overhead”) under the DirectX® 11 programming model, which leaves lesser time for the game code that delivers quality and framerates.
In DirectX® 12, however, the command buffer behavior is radically overhauled in five key ways:
- Overhead is significantly reduced by moving driver and API code to any available CPU thread
- The absolute time required to complete complex CPU tasks is notably reduced
- Game workloads can be meaningfully distributed across >4 CPU cores
- New “bandwidth” on the CPU allows for higher peak draw calls, enabling more detailed and immersive game worlds
- All available CPU cores may now “talk” to the graphics card simultaneously
Much like going from a two-lane country road to an eight-lane superhighway, the shift to DirectX® 12 allows more traffic from an AMD FX processor to reach the graphics card in a shorter amount of time. The end result: more performance, better image quality, reduced latency, or a blend of all three (as the developer chooses).
The benefit of this feature is already being seen in real games. Oxide Games and Stardock have collaborated with AMD for Ashes of the Singularity™, an upcoming strategy game that already utilizes all 8 cores of an AMD FX-8370 processor to deliver performance, image quality and resolutions that—in the words of the developer’s CEO Brad Wardell—are “not even a possibility” under DirectX® 11.
In other words, platforms with AMD Radeon™ GPU and multi-core AMD CPUs using DirectX® 12 are literally allowing developers to explore game designs previously considered impossible.
Multi-threaded command buffer recording and async shadersare two big features of the base DirectX® 12 specification, each harboring great potential to extract significantly more performance and image quality out of existing hardware.
But many gamers also know that game devs must commit to using a feature before it is seen in the real world—we’re taking care of that. Our collaboration with developers like Oxide/Stardock (and others unannounced) to get cool tech into great games is a guiding light for the AMD Gaming Evolved Program, and we’re already seeing healthy interest in these features. That bodes well for everyone!
Before we part ways, you might be interested to know which AMD products are compatible with DirectX® 12. Presuming you’ve installed Windows® 10 Technical Preview Build 10041 (or later) and obtained the latest driver from Windows Update, here’s the list of DirectX® 12-ready AMD components. We think you’ll agree that it’s an excitingly diverse set of products!
- AMD Radeon™ R9 Series graphics
- AMD Radeon™ R7 Series graphics
- AMD Radeon™ R5 240 graphics
- AMD Radeon™ HD 8000 Series graphics for OEM systems (HD 8570 and up)
- AMD Radeon™ HD 8000M Series graphics for notebooks
- AMD Radeon™ HD 7000 Series graphics (HD 7730 and up)
- AMD Radeon™ HD 7000M Series graphics for notebooks (HD 7730M and up)
- AMD A4/A6/A8/A10-7000 Series APUs (codenamed “Kaveri”)
- AMD A6/A8/A10 PRO-7000 Series APUs (codenamed “Kaveri”)
- AMD E1/A4/A10 Micro-6000 Series APUs (codenamed “Mullins”)
- AMD E1/E2/A4/A6/A8-6000 Series APUs (codenamed “Beema”)