Why GPUs for AI workloads?

There are many different ways to arrive at an answer and when discussing different processing technologies it really boils down to what you need to know, how you ask the question, and who can give you the answer you need the fastest (or at the lowest cost). If you need to know the orbital period of an extra-solar celestial body you should consult an astrophysicist, but if you need to count the number of green vs. blue marbles in a kiddie pool then a few dozen interns could get you the answer faster and maybe cheaper too.


It is widely accepted and understood (at least on a basic level) that certain workloads benefit from certain processing technologies. If you asked a random IT professional on the street what you should buy for your AI/ML workload 9 times out of 10 you would probably hear a server with GPUs, but why is that?


First we need to understand at least at a basic level some of the architectural differences between a CPU and a GPU. The CPU is of course the focal point of the entire computing system. Modern CPUs have a complex architecture and extensive instruction sets with multiple levels of on-board cache. They typically have between 2 and 64 compute cores and execute instructions synchronously with high clock speeds. A CPU can handle more complicated multi-step operations, delegate specific tasks to specialized subcomponents (such as a GPU), all while orchestrating the operation of the entire computer. The CPU will excel at diverse, sequential workloads, especially those requiring a large amount of memory.


Among CPUs there is a fair bit of variance between vendors but today most can be categorized as either Arm, CISC (Complex Instruction Set Computer), or RISC (Reduced Instruction Set Computer) processors. Arm is a greatly simplified RISC processor requiring far fewer transistors and therefore requiring less space, power, and producing less heat, making it the standard for mobile applications. Regardless of the flavor, the purpose is always the same, be the brain of the computer and perform all the data processing that cannot be offloaded to a peripheral component.


My first experience with GPU technology was a 3dfx Voodoo 2 card that I bought for my PC in 1998. I certainly didn’t know anything about AI algorithms back then, but it sure made Quake look great and run smoothly. It featured a modest 3 GPU cores but that is 3X the number of cores my Pentium II had and that along with dedicated frame buffer, texture mapper and a number of other features it was a huge upgrade.



3Dfx Voodoo 2 in SLI


The GPU is a discrete processing unit that can be called by programming APIs to perform very specialized tasks. It does not do general purpose processing like the CPU and needs to be explicitly utilized by software that supports it in order to function. As it turns out, one of the key features that makes a GPU good for rendering computer graphics can also be beneficial for other purposes. The graphics pipeline requires very fast calculations of the position, orientation, and other properties of millions of polygonal objects simultaneously which is not all that different from how many machine learning algorithms operate today. While my humble Voodoo 2 card only had a few cores, today’s GPUs feature many thousands of them. The high number of cores allows a tremendous degree of parallelism for workloads that can leverage it. Each individual core runs at a much slower speed than a CPU and has access to far less memory, but the aggregate computational throughput can be greatly increased.


The first non-graphics application for GPUs that I became aware of was crypto mining in the early 2010s. Much to the chagrin of PC gamers everywhere the Bitcoin gold rush rapidly drove demand and prices for consumer grade GPU hardware. For better or worse, the distributed nature of the block chain and the cryptographic calculations required to maintain the ledger were a perfect fit for the GPU. In the data center early use cases were virtual desktops for engineering and power users and while there are still products marketed to that segment today, GPUs are commonly known for their prowess in various AI applications.


The performance delta between a top bin server CPU and high-end commercial GPU is quite significant as indicated by the benchmark scores below from a few years ago.



While a 7-11x improvement is quite substantial one might expect closer to 50x purely as a function of the number and speed of processing cores, but there are a number of other bottlenecks and losses when moving such a great volume of data through the system so performance does not scale linearly with cores. Remarkably an entry level laptop GPU of the same vintage outperforms the Xeon 6148 CPU for these benchmarks and while it has far fewer (and slower) cores it still delivers 50% more performance per core than the enterprise grade Tesla V100 in this particular test.


Regardless of where the point of diminishing returns is, it is impossible to deny that GPU technology is vastly superior for most AI style workloads. Much like counting and comparing colored marbles in a pool, the massive number of repetitive operations and bulk processing on large data sets that are required for AI to make the best possible decisions must be handled by the unique architecture of GPU hardware in order for the business relying on that data to have any chance of being competitive. The explosive growth of AI has encouraged companies like Nvidia and AMD to produce offerings even more highly specialized to serve that market, so much so that while they are still called GPUs, they have a very different architecture than their gaming and graphics-oriented brethren. These models will typically forgo things like shaders and texture mapping units for increased VRAM and core count. The GPU will of course never replace the venerable CPU but it is definitely a critical component for enabling efficient processing of AI and ML based workloads.


So now that we have established the benefits of GPUs for AI/ML and other datacenter workloads what is next? The flow chart below highlights some options from the Dell PowerEdge portfolio and what some of the recommended use cases would be.

GPUs have come an incredibly long way since their early days, even exceeding Moore’s law. Nvidia’s RTX 3080 was designed for gaming enthusiasts yet has 8700 cores, 60% more cores than their previous generation flagship datacenter card, the Tesla V100. Their current gen H100 doubles it again with almost 17,000 and AMD is once again competing in the market with their MI100 instinct accelerators. With demand for AI enabled applications continuing to grow and electronic gaming more popular than ever, the future of the GPU industry is very exciting indeed.


#IWORK4DELL


Opinions expressed in this article are entirely our own and may not be representative of the views of Dell Technologies.

0 views0 comments