AMD Launches 64 Core EPYC CPU – Supporting A Paradigm Shift in Scientific Progress

AMD recently announced the EPYC 7H12. The flagship of the second generation lineup of their EPYC Datacenter series CPU’s. They are bringing an incredible performance leap to the market, at an unbeatable price point.

The 7H12 is one of 5 new Datacenter CPU’s in the EPYC range from AMD where the cheapest ones are sporting 48 cores and 96 threads each. Compared to Intel’s best offering that only has 28 cores, this is a serious smackdown to the computing giant Intel.

AI will progress at an unreasonably fast pace in the next few years.

The chiplet solution is one of the key reasons AMD’s CPU’s can scale up to a massive 64 cores. The chiplets that do the actual processing are made using the 7nm process. The IO die is made using the cheaper 14nm process, resulting in lowered cost and higher performance.

How did AMD get this far ahead?

The CPU market has stagnated in the recent years. Intel has had a firm grasp in the high-end computing industry for over 7 years now. During that time AMD had struggled presenting anything that could challenge Intel’s high-end chips. So instead they focused on bringing competitively priced CPU’s to the mid and low-end consumer market. This has kept them afloat, but Intel wasn’t pressured to improve their architecture by much, as they have been comfortably in the lead for a long time. AMD has therefore been able to steadily develop and perfect their 7nm process and their modular CPU architectures almost under the radar, creating a way faster and more scalable architecture than Intel has available. Intel hasn’t had a proper response to the third generation Ryzen CPU’s and the new EPYC CPU’s coming to market with this still relatively new architecture. AMD’s modular chiplet style CPU’s are finally stretching their wings as the process is becoming more and more refined.

How do the two computing giants compare?

In terms of specs, AMD’s new 7H12 EPYC vs Intel’s best offering, the Xeon Platinum 8280 is like comparing an iPhone 11 with a blackberry from 2012. It’s that bad.

(credit: LTT)EPYC 7H12Xeon Platinum 8280
Core Count64 Cores / 128 Threads28 Cores / 56 Threads
L3 Cache256 MB38.5 MB
Max Memory4 TB1 TB
Memory Channels 86
PCI-E Lanes128x PCI-E 4.0 (eq. 256 PCI-E 3.0)48x PCI-E 3.0
Price7219$ (Retail)10.009$ (Batch 1k Units)*

The retails prices for a single Xeon Platinum 8280 are astronomical so in this comparison we’ve used batch pricing, meaning the price you get when ordering 1.000 units. This means the actual price difference is even greater than the data here shows.

The core difference simply means the 7H12 will outperform the 8280 dramatically. We’re talking over 2 times in applications where all cores are utilized and all other parameters are the same. The greater L3 Cache allows for larger batches of data being computed at the same time, increasing processing speed for applications requiring lots of data to be crunched, or heavy simulations to be run. The max memory of 4 Terabytes supports this endeavour. More memory channels means higher bandwidth between memory and CPU. 128 PCI-E 4.0 lanes is just plainly ridiculous (as if anything on the 7H12 isn’t). 128 lanes means in practice 256 PCI-E 3.0 lanes. which dwarfs Intel’s offering of just 48 PCI-E 3.0 lanes.

Science and AI Development

A sight to behold – 8x 1080ti’s working in tandem

The new 7H12 offers much larger computing capability and a many times increase in bandwidth. This can boost development drastically in fields concerning medicine and physics, as they rely on these kinds of CPU’s to simulate and test new ideas and concepts. Protein synthesis simulations and cancer research are both problems where CPU’s are heavily utilized to test solutions.

Previous architectures have been limiting the potential of multi-GPU solutions

A subfield of AI, deep learning, relies heavily on GPU computing power. It’s been a challenge to scale GPU systems over 2 – 3 GPU’s in the past because of the limiting bandwidth of todays architectures. A single GPU requires 16x PCI-E lanes to operate at full capacity, and the newest most modern GPU’s require even more as their memory bandwidth has increased! When pairing 4 GPU’s in a system with only 48x 3.0 lanes they have to share the lanes with each other, resulting in a pairing of 4 GPU’s running at 16x, 16x, 8x and 8x speeds. This means the GPU’s aren’t getting data fast enough, resulting in many cores being idle, literally leaving performance on the table.

CNN’s heavily rely on GPU’s to compute demanding visual recognition tasks related to self-driving cars, Medical Diagnostic Imaging and Defect Detection in composite materials and aerospace heat shielding.

Increased Bandwidth – More performance

With 128x 4.0 lanes this is no longer a problem. Even the most modern high memory speed and bandwidth eating GPU’s can’t fully utilize 16x PCI-E 4.0 (which in practice is 32x 3.0 lanes so double the bandwidth) meaning you can have 8 modern GPU’s working in tandem without limitations compared to 2-3 in the past. In practicality, it’s completely feasible to run 16 GPU’s in tandem as all of them will be getting the classical 16x PCI-E 3.0 lanes with minimal loss. What all this jargon means in practice is that AI models can be developed at over 4x the pace before, not because of increase in GPU computing power, but because of removal of the CPU bottleneck restricting single-machine vertical scaling. The fact that high speed PCI-E 4.0 storage is also coming to market, means that AI will progress at an unreasonably fast pace in the next few years. This technology makes powerful AI models way more accessible to smaller businesses and research divisions.

Sources:

One thought on “AMD Launches 64 Core EPYC CPU – Supporting A Paradigm Shift in Scientific Progress

Comments are closed.