Altair Case Studies Clemson University's Adoption of PBS Professional for Enhanced HPC Workload Management
Altair Logo

Clemson University's Adoption of PBS Professional for Enhanced HPC Workload Management

Altair
Application Infrastructure & Middleware - Data Visualization
Networks & Connectivity - Ethernet
Cement
Education
Procurement
Product Research & Development
Inventory Management
Smart Campus
System Integration
Training
Clemson University's IT department, Clemson Computing and Information Technology (CCIT), was facing a significant challenge in managing the workload of their rapidly growing user base. The department utilized the Palmetto cluster, a 17,032-core, 262 TFlop HPC system, as the university's primary HPC resource. This system was heavily used by the university's faculty, staff, students, and 144 external users, including researchers and faculty from other universities. The cluster operated on a 'condo model', where users could purchase nodes for their own priority usage. However, the open-source Maui scheduler previously used by CCIT was unable to handle the scalability and reliability needs of their expanding user base. The system frequently crashed and some advanced features did not function properly, leading to unreliability with the scheduler.
Read More
Clemson University is a major land-grant, science- and engineering-oriented research university that ranks in the top 25 among national public universities. The university is committed to teaching and student success, fostering an inclusive, student-centered community characterized by high academic standards, a culture of collaboration, school spirit, and a competitive drive to excel. The university's IT department, Clemson Computing and Information Technology (CCIT), provides cyberinfrastructure resources and advanced research computing capabilities. CCIT supports an array of advanced computing infrastructure made possible through the integration of high-performance computing (HPC), high-performance networks, data visualization, storage architectures, and middleware.
Read More
To address the challenges, CCIT decided to adopt a commercial-grade workload management solution. After evaluating several vendors, they chose Altair’s PBS Professional® for its massive scalability and technical support. The PBS Professional scheduling software was able to meet the HPC needs of the university, providing reliability and scalability that the previous open-source tool could not handle. Altair's technical team provided comprehensive support, helping CCIT understand the advanced features of PBS Professional before purchase and offering hands-on training before the installation process. The cost was also a crucial factor in the decision-making process. Altair was able to provide an attractive academic pricing offer that fit within CCIT's budget. The implementation of PBS Professional began in September 2011, supporting 1,623 nodes. Today, the node count has increased to 1,804, and PBS Professional can easily scale to support additional nodes for the rapidly growing user base.
Read More
The adoption of PBS Professional has led to improved usability and productivity for CCIT and the university's users. The HPC administration overhead has been significantly reduced, and the demand for end-user support has decreased due to the immediate and automatic feedback provided by PBS Professional's hooks plug-in technology. Users can now easily submit numerous jobs, even queuing up thousands of jobs with confidence in their execution by the scheduler. The system is also integrated with Clemson’s “Hadoop on demand” job framework, which uses myHadoop with their own customized open source file system, OrangeFS. This integration has led to major efficiency benefits as PBS jobs can directly access data stored on OrangeFS from any compute node without the need for data staging, and the data persists between jobs.
PBS Professional supports 1,804 nodes, up from 1,623 nodes at the time of implementation.
The system is scalable and can support additional nodes for the rapidly growing user base.
The Palmetto Cluster is benchmarked at 262 TFlops and is connected to Internet2's 100 GbE Advanced Layer 2 Service.
Download PDF Version
test test