Modernising A Sustainable Data Centre For The AI Era
By Madhu Rangarajan, Corporate Vice President, Server Solutions Group, AMD
Enterprise data centres are at a crossroads today. The explosive growth of AI around the world is creating an unprecedented demand for compute power and scaling out compute, data, as well as AI workloads have never been more important. At the same time, IT leaders are under continual pressure to reduce their carbon footprint and costs. As a result, many are accelerating their modernisation efforts and transitioning to more performant, efficient and cost-effective hardware.
This process of migrating and re-platforming existing digital business workloads can be complex. Many organisations fear possible business disruption and question whether the choice to invest in new systems to support applications that are currently working really makes sense. Unfortunately, if applications are left untouched for too long, they can become brittle and hard to integrate. Even finding staff to maintain them can become problematic.
Within this dynamic environment, it is imperative organisations re-look how they approach data centre design, construction, and operation to achieve a sustainable pace of modernisation without compromising on the opportunities to capitalise on the immense potential of AI.
Modernising with energy efficient architecture
Undertaking modernisation projects is a strategic choice – it can improve business efficiency, increase engineering productivity, accelerate innovation and even reduce carbon footprint at the same time – and alignment with the business objectives of the organisation is critical. Understanding the nature of the workloads and business processes helps drive value.
For instance, AI compute racks can require seven times the power of the typical enterprise data centre rack, and these requirements are expected to increase to 20 times the power of typical racks as the technology evolves further. Therefore, while modernising generally brings about smaller footprints, cost improvements and efficiency gains, transforming data centres to support new AI requirements will amplify power, cooling and water resource consumption.
The first step of modernisation should therefore be careful evaluation of the available platforms. The key is to benchmark a representative set of applications that can provide success metrics from Proof-of-Concept (POC) that can reliably represent production scale outcomes. IT leaders need to ensure that the ROI (Return on Investment) during the POC stage is multi-fold. They need to look for benchmark performance – significant improvement in performance is essential – as well as consider application specific performance. This, coupled with better scaling and operational efficiencies can inform them on whether to add newer use cases within the same energy footprint.
CIOs can also modernise their data centres and realise huge savings in power consumption and floor space. When compared against last-gen processors, the latest server offerings can realise up to 68% less power, up to 87% fewer servers and up to 67% lower 3-year TCO. Advances in technology simplify efforts to increase density, improve performance, and deliver agility – all within the same physical footprint. This means having the necessary headroom for new or more demanding workloads without changes to the data centre, which can increase power and cooling demands.
Adaptable infrastructure
Modernisation efforts, however, can be a challenge for IT leaders to articulate to business and executive management. The potential risks of execution complexity and increased investment are rational concerns for the IT leader. While “leaving well enough alone” is a strategy with low near-term risk, it can leave the enterprise open to larger strategic risks and missed innovation opportunities. The right hardware platform coupled with adaptable infrastructure, can make all the difference.
To maximise the business value of AI, IT leaders need to pivot toward a fit-for-purpose CPU selection and deployment mindset to optimise data centre and cloud computing capabilities that match legacy and modern workloads to the best CPUs for the job at hand.
The exercise is an opportunity for delivering exponential rewards by leveraging the open, efficient, and highly performant platforms. This adaptability is crucial for navigating the evolving landscape of AI. Today, the momentum of open technology innovation from x86 based platforms – such as AMD EPYC™ – can significantly reduce the risk while amplifying the rewards. Compute on the x86 infrastructure enables a diversity of power, performance, and throughput optimised applications to support tomorrow’s enterprises.
As an example of the benefits of technological openness, application software designed for x86 platforms is delivered as a single package distribution across all platforms of the same architecture. This means that businesses can run any software – whether its OS, applications, or in house development of software – with little to no modifications on AMD EPYC™ processors when transitioning from other x86 platforms. This means that IT leaders can effectively modernise their data centres with minimal disruption by leveraging the openness of the ecosystem to migrate to other vendors when they have outgrown their existing platform.
This flexibility translates to a major business advantage for enterprises. For example, upon migrating to AMD EPYC™ processors, Yahoo! Japan were able to reduce the number of racks by about 20 percent to run the same number of virtual machines when compared to their previous vendor. Similarly, France-based data centre provider Cyllene saw a 30% drop in power consumption and heat emissions while measuring a 40% uplift in application performance following its deployment of AMD EPYC™ processors. Having standby capacity that is ready to go in an instant but not drawing power until needed enables the provisioning of new digital resources at the same speed as cloud. Modern management tools make it possible to have resources “idling,” to be used as needed.
A holistic approach
Today, AI holds great promise, which also comes with its own appetite for power. A holistic strategy that encompasses hardware, software, and operational best practices is essential to optimise the entire data centre ecosystem for sustainability.
Unlike many traditional applications designed to run on standard CPUs, most AI workloads have unique use case–specific performance, security, data compliance, and cost. Every infrastructure component has costs above and beyond the actual price of hardware, such as software licenses, maintenance, power, cooling, and administration.
The most important steps IT teams can take to improve data centre energy and operational efficiency are straightforward: optimise each component and use the latest and most capable technologies. This includes not only selecting energy-efficient hardware but also implementing next-generation management and operations tools. Modern operations software is essential to optimising data centre efficiency; for example, it is now possible to spin up or down individual devices as needed, reducing the cost of powering and managing unused capacity. Better management and operations tools also improve resiliency and help reduce downtime.
This comprehensive approach ensures that the data centre is optimised not only for performance but also for sustainability, aligning with ESG goals.