The challenge
More sustainable approach to power artificial intelligence (AI)
There has been an explosion in the demand for artificial intelligence in research. As researchers’ goals become more complex and significant, their AI models become increasingly intricate, and the size of the data sets they need to process increases exponentially, requiring more powerful computer resources.
Researchers wanting to train or adapt foundation models and/or other large AI models need access to AI accelerator chips known as graphics processing units (GPUs).
All of this computer power requires a lot of energy to run, so we needed to find a more sustainable, environmentally friendly approach to powering our next high-performance computer (HPC).
Our response
Speed up scientific discoveries
We worked with Dell Technologies to build a high-performance computer (HPC) system which will speed up scientific discoveries and help grow Australia’s industry and economy.
Named Virga, the HPC system is built on state-of-the-art Dell PowerEdge XE9640 servers and is the first deployment of its kind in Australia, designed to optimise artificial intelligence (AI) workflows while also being power-efficient using direct liquid cooling.
Professor Elanor Huntington, CSIRO’s Digital, National Facilities and Collections Executive Director, said Virga will provide the critical computing infrastructure needed for machine learning and AI to grow Australia’s industry and economy.
“CSIRO is proud to be a steward of some of Australia’s most important pieces of research infrastructure,”
Professor Elanor Huntington
“AI is used in practically all fields of research at CSIRO, such as developing world-leading flexible printed solar panels, predicting fires, measuring wheat crops and developing vaccines, just to name a few.
“High-performance computing systems like Virga also play an important role in CSIRO’s robotics and sensing work and are crucial to the recently launched National Robotics Strategy to drive competitiveness, and productivity of Australian industry.”
The HPC cluster, which is housed at CDC’s Hume Data Centre in Canberra, is named Virga after the meteorological effect of rain that evaporates before it reaches the ground and was named in recognition of CSIRO’s decades of research into cloud and rain physics.
The Virga cluster features:
- NVIDIA H100 Tensor Core GPU accelerators to support deep learning, machine learning and AI.
- 94GB of high-bandwidth memory per GPU
- Transformer Engine - significantly speeds up AI performance and capabilities and helps train large models within days or even hours.
- 4th Gen Intel® Xeon® Scalable processors
- Hybrid direct liquid cooling to reduce the need for energy intensive air cooling
The results
AI models to diagnose pathology from MRI scans
Dr Jason Dowling from CSIRO’s Australian e-Health Research Centre said the increase in medical imaging data coupled with the growing complexity of diagnostic techniques has led to an urgent need for advanced computational power and data processing for medical image analysis.
“The new HPC facilities will allow researchers in our Australian e-Health Research Centre to train and validate new computational models, which will help us develop translational software in medical image analysis for image classification, segmentation, reconstruction, registration, synthesis, and automated radiology reporting,”
Dr Jason Dowling
“One collaborative project with the Queensland Children’s Hospital that will benefit from the new cluster is the training of artificial intelligence (AI) models to diagnose pathology from MRI scans of the lungs in children with cystic fibrosis.”