Powering clustered AI processors
Vertical power enables up to 100kA of current and multi-rail core voltage delivery. Article written by Paul Yeaman, Senior Director, Applications Engineering.
The recent introduction of new clustered AI ASIC processor based supercomputers are pushing the boundaries of power delivery networks to levels that were never imagined just a few years ago. With current levels approaching 100 kiloamps/ASIC cluster, innovation is needed across power system architectures, topologies, control systems and packaging to deliver such high currents. Because of the escalating power levels, 48V power delivery is essential. Furthermore, tightly packed processor clusters limit the feasibility of lateral power delivery, rendering a new approach necessary.
Vicor 48V direct-to-load (<1V) Factorized Power Architecture (FPA™) is a major departure from the common 48V intermediate bus architecture (IBA) consisting of an intermediate bus converter followed by multiphase PoL regulators. FPA uniquely addresses each of the power delivery challenges facing clustered processor systems with innovative solutions and also enables vertical power delivery (VPD), which is essential to provide high currents to such systems.
Clustered power delivery challenges
Clustered ASICs are tightly packed to achieve the high-speed bandwidth required to achieve the teraflops of processing performance required for AI training workloads such as autonomous driving. Each processor in the cluster can itself require 600 to 1000 amps, which for even single-processor accelerator cards presents a power delivery challenge with significant PCB or substrate impedance losses if the VR placement is not physically close to the processor power pins.
Additionally, the rapid advancements in artificial intelligence (AI) are being enabled by GPUs and specialized AI processors utizing silicon process nodes at 7nm, 5nm and soon, 3nm. Nominal core operating voltages at these process nodes are currently between 0.75 and 0.85V. To meet the performance workloads that AI demands, GPUs and processors are mounted on accelerator cards which are then clustered into a server rack based system with 4, or 8 cards per rack for data centers and high performance computers. However recent introductions from Cerebras and Tesla have shown an alternate approach of clustering the AI ASICs themselves which enables extremely fast, high-density supercomputers but presents additional significant power delivery and thermal management/cooling challenges.
For power delivery, the ASIC/GPU cluster leaves no room for lateral power delivery as in single- or dual-processor AI cards and the high-speed I/O used is extremely sensitive to high-current switching noise as is present with hard-switching multiphase buck regulators. Moving the hard-switching multiphase VR even closer to the processor also brings the associated VR noise with it which further compounds the number challenge of designing a PDN sufficient for the noise-sensitive I/O. At a typical design value of 40 – 60A/phase, the number of discrete phases needed to deliver high peak currents (>1500A per core in many cases) can easily exceed 30 phases per AI ASIC or GPU, a number that is difficult, if not impossible, to achieve with lateral power delivery.
Factorized Power unlocks new levels of current delivery
Factorized Power Architecture™ is based on the fundamental principle of dividing a power converter into two primary functions, optimizing each separately and then implementing those functions as a system. The two functions are regulation and current multiplication.
The efficiency of a regulator is inversely proportional to the work performed — the more work, the lower the efficiency. The closer the input and output voltages of a regulator are to each other, the less work is performed and a higher efficiency is achieved. By virtue of its position in the system, FPA™ minimizes the regulator’s input-to-output voltage differential. The PRM™ regulator is implemented using a zero-voltage switching (ZVS) buck-boost topology, which features high efficiency where the input and output voltage difference is small. ZVS greatly reduces switching losses, enabling high-frequency operation and greatly reducing converter size. The PRM typically regulates an input between 40 and 60V to an output voltage between 30 and 50V.
Soft switching and current multiplication
The PRM is followed by a second stage performing a voltage step-down and current step-up function. This is implemented using the Sine Amplitude Converter (SAC™) topology in a device called a VTM™ Current Multiplier. The VTM’s behavior can be realized as an ideal transformer, where the input and output voltage are related by a fixed ratio and the device impedance remains low (hundreds of µΩ) beyond 1MHz.
Since there is no energy storage in the VTM, it can provide large amounts of power if it is kept sufficiently cool. This allows for matching the power capability of the VTM with the thermal capability of the processor.
The SAC topology uses a zero-voltage and zero-current switching control system, further reducing switching noise and power losses.
Together, the PRM and VTM form the building blocks of FPA. One is dedicated to regulation and the other dedicated to transformation and current multiplication.
SM-ChiP package reduces noise and improves thermals
While the topology and architecture used to implement a high-performance regulator are important, of equal importance is the packaging technology. The Vicor SM-ChiP™ package integrates everything—passives, magnetics, FETs and control—into a single device. Moreover, this package is engineered to enable the most efficient extraction of current at the lowest thermal impedance to facilitate cooling. Many SM-ChiPs also include grounded metal shielding over a significant surface of the device. This serves not only to facilitate cooling but also to localize high-frequency parasitic currents to keep them from propagating outside the device.
Vertical power delivery cuts PDN losses by 95%
Lateral power delivery for clustered processor arrays is almost impossible with large arrays. The better solution for cluster-processor power delivery is vertical power delivery (VPD). In VPD, the current multiplier is located directly underneath the processor on the opposite side of the board, significantly reducing PDN losses by reducing the distance the current travels through the motherboard. VPD needs two key features to achieve this function.
First, the area directly under the processor contains high-frequency capacitors which are necessary to decouple very high-frequency currents (>10MHz) from the rest of the system. Secondly, for maximum efficiency the physical location and pattern of the current exiting the VPD solution must exactly mirror the location and pattern of the processor core power inputs. This enables the high-current flow to achieve a true “vertical” profile.
To achieve these features, the Vicor VPD solution is an integrated module consisting of three layers: a VTM Current Multiplier array implemented with a gearbox below and a PRM Regulator mounted above to provide a completely regulated 48V-to-load solution for each processor, a DCM™. The gearbox performs two functions: it incorporates high-frequency decoupling capacitance and redistributes the current from the VTM into a pattern mirroring the processor above it. The VTM array is sized based on the processor output current requirement and PRM is sized based on the power requirement. If the GPU or ASIC requires multiple power rails then the VTM and PRM layers can be implemented with independent PRMs and VTMs sized to meet the current and power voltage requirements for each specific rail.
Vicor FPA™ architecture, ZVS and ZCS control system, high-frequency SAC current multiplier topology and SM-ChiP packaging technology provide all of the elements for perfecting VPD. It solves the low-noise, clustered power delivery challenge while easing the cooling and thermal management mechanical design with high efficiency and a thermally-adept power module package. The VPD solution is a true enabler for higher-performance AI systems by allowing high-speed massive data analytics via the cluster to perfect training models and advance machine learning to significantly higher levels.
A better way for high performance computing power
AI and machine learning are in their infancy of growth. This train will only pick up speed as the years go by. This acceleration is going to require faster processing for more complex solutions. AI ASIC processor based supercomputers will demand more power than conventional methods can possibly deliver. A new, innovative approach to power delivery is the only way the promise of AI can come to fruition. It will require power system architectures, topologies, control systems and packaging working in concert to deliver ever-increasing high currents. Vertical Power Delivery, leveraging current multiplication, is the solution of choice. It is a proven approach that meets the demands to high performance computing today and can easily scale to to keep pace with future needs. It is compact, efficent and can reduce PDN losses by up to 50%.
Paul Yeaman works extensively with technology leaders to develop and implement leading edge power solutions in systems with some of the most demanding power requirements in the industry. With regular exposure to the power challenges posed by new technologies, Paul is aware of broad based industry trends in power and works to insure that innovators are able to incorporate power solutions that meet these demands. Paul has over 20 years of experience in the power electronics industry in both Design and Applications Engineering.
Paul Yeaman, Senior Director, Applications Engineering
This article was originally published by Power System Design.