Article

High-performance computing power FAQs

These FAQs cover the power system design issues associated with high-performance computing, and some innovative solutions currently available.

By Anish Jacob, Principal Field Application Engineer, HPC

What are the characteristics of high-performance chips?

Modern-day GPUs have tens of billions of transistors. Better processor performance comes at the price of exponentially rising power demands, so high performance processors for applications like artificial intelligence (AI) and machine learning (ML) demand continuously more power. Meanwhile, core voltages are declining with advanced processing nodes contributing to the growth of the current.

As peak currents of up to 2,000A are now becoming typical, some xPU companies are evaluating multi-rail options where the main core power rails are split into five or more lower-current power inputs.

Additionally, the highly dynamic nature of machine learning workloads result in the chips imposing high di/dt transients lasting several microseconds. These transients create stress across the PDN of a high-performance processor module or accelerator card.

What level of peak current requirements are associated with HPC?

The current trend is that a processor’s power consumption doubles every two years. Peak currents of 2,000A are now becoming typical.

What are the factors limiting further increase in HPC performance?

In most cases, power delivery is now the limiting factor in computing performance. Processors are capable of higher performance if the proper power is supplied. Power delivery entails not just the distribution of power but also the efficiency, size, cost and thermal performance of the power delivery network (PDN). Given that PCB space is finite, power dense components are the best option to optimize the PDN.

PDNs are further challenged not only by power levels, but also by highly dynamic workloads which can create voltage spikes; these can disrupt or damage sophisticated processors. Routing power paths to limit this is complicated by the number of other PCB components also needing real estate. The PDN is also subject to I²R losses, which not only reduce efficiency, but will also create thermal issues if not managed properly.

Why are AI or high-performance computing currents hard to manage?

AI/HPC currents hard to manage for two reasons: the first is that, as loads and currents increase, heavier currents can quickly lead to unsustainable I²R losses across the power delivery network.

Secondly, transients are harder to handle due to the increased absolute difference between the peak and idle current consumed. There is also a higher di/dt.

Also, a large number of external socket capacitors are required to keep load current within the ripple envelope.

Why 48V is the “new 12V”? What are the challenges?

To increase overall efficiency of data centers they are migrating from 12V_DC to 48V_DC supply rails. As a result, currents between the printed circuit board input and the final conversion stage are dropping by factor 4 and the corresponding ohmic losses by factor of 16.

At the same time CPU core voltages are dropping to levels well below 1V. So, the gap between supply and point of load voltages is widening for two reasons; and this is a problem, as regulator efficiency decreases as voltage differentials increase.

Why are conventional approaches to power delivery inadequate?

In a typical processor package, all the current is consumed by the core at the center. This means that even if the voltage regulator is positioned adjacent to the edge of the package, there is still a significant distance that the high current must travel to get to the core. The current’s path is known as the ‘last inch’, and is subject to PCB resistive losses as well as parasitic inductance and capacitance.

In the conventional voltage regulator approach, higher current requires more phases. Since most multi-phase voltage regulators are discrete devices, the inductor and switching stage must be laid out individually— and in most cases cooled individually as well. Therefore, more phases mean a larger voltage regulator that increases the challenge for close placement near the processor.

Additionally, any power solution using the conventional multi-phase approach must be sized to accommodate peak currents. By contrast, Vicor designs need only be sized for steady state conditions as the Vicor VTM modules can deliver 2x the rated power for transients.

How can AI/HPC power delivery challenges be mitigated?

The Vicor Factorized Power Architecture (FPA) is the foundation for delivering more efficient power for today’s unprecedented surging computing demands. FPA divides the task of a power converter into the dedicated functions of regulation and transformation. Separating the two functions allows both to be optimized individually to foster high efficiency and high density. FPA in conjunction with the Sine Amplitude Converter (SAC) topology underpins several innovative power architectures that can help unleash today’s high-performance processors.

Leveraging FPA, Vicor minimizes the “last inch” resistances via proprietary architectures, lateral power delivery (LPD) and vertical power delivery (VPD). In LPD, two current multipliers (Vicor VTM™ modules) flank the north and south side or the east and west side of the processor.

Vertical power delivery is the ultimate way of delivering high current at low processor core voltages with the lowest PDN resistance. In this case, current multipliers are mounted directly underneath the processor. In both cases, they dramatically reduce last inch losses.
It is also possible to combine both approaches to optimize PCB usage in case of very high currents.

For VPD the final current multiplier stage and bypass capacitors could be stacked on each other to form an integrated power module (geared current multiplier) that can mount directly underneath the processor by displacing the bypass capacitor bank.

Why is packaging technology important for AI/HPC power delivery solutions?

While the topology and architecture used to implement a high performance regulator are important, the packaging technology is equally important. Vicor’s SM ChiP™ (Converter housed in Package) package integrates everything – passives, magnetics, FETs and control – into a single device.

Moreover, this package is engineered to enable the most efficient extraction of current at the lowest thermal impedance to facilitate cooling. Many SM-ChiPs also include grounded metal shielding over a significant surface of the device. This serves not only to facilitate cooling but also to localize high-frequency parasitic currents to keep them from propagating outside the device.

How can legacy 12V units be used within 48V intermediate bus power supply systems?

Data centers are deploying 48V power delivery networks (PDNs) as system power levels continue to increase; 48V based architectures maximize power network efficiency while maintaining Safety Extra Low Voltage (SELV) levels.

Accordingly, the Open Computer Project (OCP) is supporting the move to 48V with its Open Rack Standard V2.2 for distributed 48V server backplane architectures and a 48V standard operating voltage for AI Open Accelerator Modules (OAM).

These new standards require 48V to 12V and 12V to 48V compatibility to support 12V legacy backplanes and 12V multi-phase VRs for processors. However, conventional 1/8 and 1/4 open frame brick converters are large and bulky and do not meet the power density needs of advanced systems. In addition, conventional converter topologies have inefficiencies that can reduce the 48V distribution gains.

Vicor has introduced new high-density and high-efficiency module solutions for bridging 48V to 12V and 12V to 48V systems. These regulated and fixed-ratio converters are enabling 48V PDN deployment with its inherent efficiency advantages while alleviating the burden of re-designing 12V legacy systems. These converters establish new power conversion performance standards along with options for various application needs.

How does data center Power Usage Effectiveness (PUE) affect the HPC power delivery network?

The data center’s PUE reflects how much of the power entering a data center is being expended on non-computing activities, and how much is available for the computer PDNs.

How does Vicor design an effective AI/HPC power delivery network?

While every design is different, Vicor goes through a methodical seven-step process to optimize a PDN for a specific purpose:

Understand the entire application. What is the application’s primary function, and what power related features would enable further improvements?
Look at the customer’s past and present solutions and see where we can bring improvements.
Retrieve the requirements (CART file) and propose PRM™s and VTMS accordingly.
Schematic and Layout prepared based on existing Eval boards.
After thorough review by field and factory apps, the customer produced a few test boards.
Those boards are then benched and run through many types of test (Loop tuning for transients, phase /gain margin etc)
Once the customer is happy we save all our settings from the previous step and then release the information to the factory for mass production.

Why is computing at the edge particularly challenging?

The success of edge computing depends on the availability of suitable hardware; systems that can economically provide the necessary processing speed and power, while being able to survive in the less regulated and more unpredictable environments encountered away from the conventional data center.

The edge computing hardware must comprise compact, energy-efficient solutions that can be widely deployed, even in space-constrained and harsh environments, to locate computing closer to sensors and other data sources. This hardware includes the power delivery networks and, conventionally, bulky low-voltage power solutions. These cannot support the increasing power density and small form factors at the edge and are a major bottleneck for edge computing innovation.

How can Vicor technology contribute to edge computing power delivery?

Vicor technology can be used in designing highly scalable, compact, edge computing resources that can thrive outside in harsh environments.

Such resources solve the technical bottlenecks of signal integrity over shorter electrical traces, while offering highly compact and efficient power conversion and energy efficient cooling and engineering. In addition to being device-agnostic, flexible and scalable, the systems produced are high-performing and can save at least 40% on energy consumption compared to legacy systems.

Vicor high-density, high-efficiency power modules contribute to solid-state, thermally adept, compact, energy-efficient EMDC designs.

Anish Jacob is a Principal Field Application Engineer specializing in the Data Center and AI market at Vicor Corporation. He holds degrees from The Ohio State University and the University of Southern California and has been a valuable member of the Vicor team since April 2015, offering his deep expertise to deliver cutting-edge solutions and support to customers.