Altair Case Studies Optimizing Compute Performance: A Case Study on Nanyang Technological University
Altair Logo

Optimizing Compute Performance: A Case Study on Nanyang Technological University

Altair
Infrastructure as a Service (IaaS) - Hybrid Cloud
Networks & Connectivity - Ethernet
Education
Semiconductors
Maintenance
Product Research & Development
Smart Campus
Time Sensitive Networking
Cloud Planning, Design & Implementation Services
Nanyang Technological University's High Performance Computing Centre (HPCC) was facing a significant challenge. With over 4,500 CPU cores, 40 NVIDIA Tesla GPGPU cards, 2,700TB storage, 100GB InfiniBand interconnect, and 40G/100G Ethernet backbone with technical support, HPCC was producing nearly 19 million core CPU-hours and nearly 300,000 GPU-hours in 2021 to support more than 160 NTU researchers. The HPCC digital community had grown to nearly 800 NTU members, and as its ranks continued to increase, the number of HPC and AI applications was growing rapidly. The small, four-engineer team at HPCC needed cutting-edge tools to support their growing user community and evaluate scaling up to a hybrid cloud environment. They required job-level insights to understand runtime issues, metrics on I/O, CPU, and memory to identify bottlenecks, and the ability to detect problematic applications and rogue jobs with bad I/O patterns that could overload shared storage.
Read More
Nanyang Technological University (NTU) Singapore is a research-intensive public institution that supports around 33,000 students and 10,000 staff in engineering, science, business and humanities, arts, social sciences, and medicine. NTU is one of the world’s most prestigious universities and it’s among the oldest in Singapore, with the nation’s largest campus at nearly 500 acres. NTU’s High Performance Computing Centre (HPCC) was established in 2010 to support the university’s large-scale and data-intensive computing needs, and the need for resources continues to grow.
Read More
To address these challenges, the HPCC team deployed Altair Mistral to profile application I/O and determine the most efficient options to optimize HPC at NTU. They measured the performance of the popular Gaussian chemistry application with three different types of storage: local NVMe, tier 1 scale-out all-flash NAS, and tier 2 scale-out NAS with SSD/HDD. Mistral measured the application’s job-run characteristics based on several parameters including read and write counts, read and write bytes, memory usage, processing time, and I/O latency. The metrics revealed the strengths and weaknesses of each type of storage. With I/O profiling using Mistral, NTU’s HPCC team can now find the best-fitted nodes for application requirements and determine the most affordable, best-performing storage for different application types — and know which are best-suited for cloud vs. on-premises infrastructure.
Read More
As a result of using Mistral, the HPCC team at NTU Singapore determined that a hybrid architecture with different storage media and a good L3 cache could be more performant and cost-effective than focusing only on a single storage medium, especially if the OS can utilize the various strengths of each medium. They plan to continue to collect metrics for additional applications and perform more tuning and optimization to support education and research. The HPCC team is happy with Mistral’s results and with Altair. The Senior Assistant Director of the High Performance Computing Centre at NTU, Melvin Soh Hwee Jin, praised Altair for their personal and professional support for customers.
Local NVMe was fastest, completing the job in 32,208s.
Tier 2 scale-out NAS with SSD/HDD finished second with 34,326s.
Last was tier 1 scale-out all-flash NAS at 40,746s.
Download PDF Version
test test