BylinesArtificial IntelligenceDigitalization

Scaling Infrastructure for the Age of AI

By: Jenn Mullen, Keysight Technologies

Price Waterhouse Cooper (PWC) estimates that 45% of total global economic gains by 2030 will be driven by AI as more sectors embrace the productivity and product enhancement benefits of AI. PWC’s research further suggests that AI could add an additional US$15.7 trillion to global gross domestic profit (GDP), which amounts to an increase of roughly 14%.

Infrastructure such as servers, storage systems, networking equipment, and data storage for enterprises, enables progress. . But as AI adoption increases, the demand for AI-ready compute, storage, and network capacity is already exceeding its availability. This era-defining economic opportunity is driving demand for computational power and power density past what current capacity can handle.

Computational power: the modern world’s most precious resource

AI data centres enable AI innovation because they offer the immense data storage, lightning-fast networking, and high-performance computing (HPC) capabilities required for AI workloads. They also have sophisticated cooling and power management systems that address the challenges associated with the high-density power demands of AI hardware. Without these unique features, the pioneering innovation pushing AI to its limits today would not be possible. However, as more organisations look to capitalise on the potential of AI, AI data centre designs — and the systems they employ — must evolve.

AI systems are rooted in machine learning (ML) and deep learning techniques, both of which are notorious for their computational intensity. AI models process vast quantities of data when they are being trained. They adapt and refine parameters throughout training to optimise performance. Even for basic models, this is a computationally intensive process.

In just a few short years, AI-based applications have advanced to a point that they’re subject to the law of diminishing returns. Increasingly complex models are needed to enhance existing use cases and to push the boundaries of emerging ones like generative AI. However, the computational power required to train advanced AI and ML algorithms increases by orders of magnitude as the models become more advanced and more demands are made of it. This magnitude can be illustrated by looking at OpenAI’s early generative AI (gen AI) machine models. Over the course of six years, the company’s machine learning models saw a staggering 300,000-fold increase in the computing power required to run their models.

Six years ago, OpenAI had little competition for the resources required to train the models that became ChatGPT. Now, there are significantly more players on the field training gen AI models – all vying for access to only modestly more resources. Computational power on the scale necessary to produce the next ChatGPT has become a precious, finite resource. Expanding access to this resource is a costly affair as evidenced by the size of investments being made to build them. With AI evolving at a breakneck speed, AI data centre developers are looking for solutions to ensure that these critical innovation enablers can adapt and scale to meet future demand.

Planning for the unpredictable

Building a data centre for the age of AI means ensuring that these facilities can accommodate the power consumption of large-scale GPU clusters, adapt to shifting balances between cloud and edge computing, and increase capacity to keep pace with rising demand without disruptions or downtime. In addition to increasing capacity by constructing new data centres, it is crucial to ensure they are reliable and secure. Currently, traditional data centre testing solutions are used to design and test both the components and the systems comprising AI data centres. This approach has been pushed past its limits, necessitating a new approach.

AI data centres are comprised of intricate systems built from a complex web of individual components. A flaw in any one of these pieces weakens the infrastructure that enables the innovation and market capex it promises. An AI data centre then, is only as reliable as its weakest link. On the cutting edge of performance, every chip, cable, interconnect, switch, server, and GPU represents both vast potential and equally steep risk. To mitigate this risk, each individual component must work independently and cohesively as a system – under relentless and growing demand.

Building networks capable of handling the crushing demands of AI workloads means validating every component, connection, and configuration. With the stakes and scale this high, even the smallest efficiency gain, operational improvement, or performance enhancement can push back against innovation’s diminishing returns. Staking a successful, profitable claim in this modern gold rush requires a new technology stack that can withstand whatever the future brings.

Future-Proofing AI Innovation

Meeting future demand for AI-ready networks, semiconductors, and data centre equipment demands an AI-ready tech stack of test and simulation tools. AI-ready test and simulation tools will set the successful apart. Keysight is helping AI data centre designers to future-proof designs and build out a robust tech stack of tools tailored to the dynamic needs of such complex environments. With a full-stack portfolio of simulators, emulators, and test hardware, Keysight solutions are used to emulate real-world AI workloads, validate network components, and optimise system-level performance across every layer from physical hardware to application-level behaviour.

Jenn Mullen

Keysight Technologies

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *