How AMD’s APUs can power energy efficient supercomputers

Last month students from Bentley and Northeastern Universities entered the prestigious Student Cluster Challenge competition at the Supercomputing Conference in Denver, Colorado. The team participated in the Commodity Track, which had strict power and budgetary constraints, forcing competitors to use off-the-shelf hardware. In order to meet these, the team chose AMD’s A10 Accelerated Processing Units (APU) as the foundation for their cluster.

Not only did the team win, but it beat a number of teams that had double the power available to them and unlimited budget. To see what they did, read AMD A-Series APUs power Bentley and Northeastern university students’ Cluster Competition submission to glory at Super Computing 2013.

The Bentley and Northeastern students were able to achieve this outstanding feat through a combination of hard work and picking the right tools for the job. The team was the only one to use APUs, while other competitors made use of traditional CPUs, and in the case of two teams, pairing them with consumer GPUs.

Since 2011, AMD’s A-Series APUs have provided compelling performance and value for money by combining multiple x86 CPU cores and a Radeon GPU on a single chip. APUs deliver stunning visuals and immense, power-efficient compute capability while supporting the OpenCL™ programming language – a great combination for desktops, laptops and tablets. Students from Bentley and Northeastern have now shown that APUs can also serve as the basis for cost-effective, power efficient high performance computing cluster.

But what makes an APU good for HPC applications? AMD’s Shankar Viswanathan, an engineer who has worked on several generations of AMD’s APU architecture and provided advice to the Bentley and Northeastern team, explained some details around the hardware and software that makes an AMD APU a compute powerhouse.

The APU’s memory architecture – “The APU allows very low latency transfer of data processing from CPU to GPU and vice versa. Due to the APU’s implementation of an on-die memory controller, the latency is lower than if it were on a discrete GPU that used the PCI-Express bus.”
AMD Turbo Core – This allows power to be shifted between CPU and GPU depending on load. Viswanathan explains, “This technology allowed the Bentley and Northeastern team to use the same hardware for both the CPU intensive application as well as the GPU focused one. In one of the applications they even did some initial serial processing on the CPU and then enabled the GPU to do all the data parallel operations. Turbo core allowed them to really boost the speed when only one of the CPU or GPU was active.”

Going into detail on AMD Turbo Core, Viswanathan said, “The technology allows CPU clock speeds to be increased dynamically when encountering execution scenarios that are primarily suited to the CPU. Turbo Core can also boost GPU frequency when executing GPU intensive tasks. This versatility is crucial in a dollar and power constrained environment such as the Student Cluster Competition, but the benefit can extend to some HPC workloads in industry and academia».

And with AMD’s future APUs sporting Heterogeneous System Architecture (HSA) features, Viswanathan said that the APU will offer great performance potential for the HPC community. “One of the main advantages of using AMD APUs for HPC workloads with HSA technology is that it allows both the CPU cores and the GPU to collaboratively process huge data sets, thus accelerating the program execution.”.

Having top-notch hardware is only part of the formula when squeezing optimal performance out of a HPC cluster, with software also playing a crucial role. Viswanathan pointed out three tools that can help developers extract the most from an AMD APU.

AMD Core Math Library – a free optimised and threaded math library that provides functions such as BLAS, LAPACK, FFTs and random number generators;
AMD APP SDK – helps developers to leverage programming languages such as OpenCL, Bolt, C++ AMP, or Aparapi to access the compute power within the GPU;
AMD OpenCL driver – optimised for AMD APUs.

AMD’s A-Series APUs have long been compelling processors for desktops, laptops and tablets, but the combination of a powerful multi-core x86 CPU and a Radeon GPU on a single die, coupled with tremendous software support, makes it a great choice for HPC workloads. With HSA, APUs will deliver even more compelling power efficient performance at an affordable price.

Lawrence Latif is a blogger and technical communications representative at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.