DataDirect Networks Case Studies ACCELERATE: ACADEMIC RESEARCH - Researching the Genetic Basis of Behavior, Cognition and Aff ect, USC Needed a High Performance, Scalable Infrastructure to Support Next-Gen Genomics Sequencing
Edit This Case Study Record
DataDirect Networks Logo

ACCELERATE: ACADEMIC RESEARCH - Researching the Genetic Basis of Behavior, Cognition and Aff ect, USC Needed a High Performance, Scalable Infrastructure to Support Next-Gen Genomics Sequencing

DataDirect Networks
Analytics & Modeling - Big Data Analytics
Infrastructure as a Service (IaaS) - Cloud Storage Services
Healthcare & Hospitals
Life Sciences
Product Research & Development
Predictive Maintenance
Process Control & Optimization
Cloud Planning, Design & Implementation Services
System Integration
The Laboratory of Dr. James Knowles at the Zilkha Neurogenetic Institute, Keck School of Medicine at the University of Southern California (USC) was facing a significant challenge. The lab, which is focused on understanding the genetic basis of behavior, cognition, and affect, was struggling with a legacy SAN storage server that was nearing capacity and could not keep up with data access requirements. The storage throughput was hobbled by the network and by the performance limitations of NFS. The storage bottleneck caused by slow uploads was delaying time to discovery. The lab needed a new storage solution that could serve in excess of Gigabyte per second throughput and scale to petabytes in a single name space. The Knowles Lab had a data storage performance problem. They needed to sequence 1,400 full human genomes to support their ongoing studies. This work would generate several terabytes of raw data per day that needed to be transferred, inspected, and aligned to the human genome. Their legacy storage system could only output enough data to the CPU cluster to run a single instance of their Burrows-Wheeler Aligner (BWA) under the Pegasus MPI workflow. Furthermore, they could only upload data to that system at 30-50 MB/second, nowhere near the 100MB/second peak theoretical capacity of the GbE network. This bottleneck was not only an inconvenience, but it was slowing their time to discovery.
Read More
The Zilkha Neurogenetic Institute is an integral part of a broader USC neuroscience initiative, promoting collaboration between researchers from diverse disciplines. It was designed to foster interaction among the best and brightest. Scientists at the Institute reach across boundaries to embrace methods and techniques from other fields of study, identifying new approaches to examine nervous system function, so we may all better understand the underlying causes of neurological and psychiatric disorders. The Laboratory of Dr. James Knowles is interested in understanding the genetic basis of behavior, cognition and affect. At present, most of the lab's efforts are directed to understanding the transcriptional program of brain development and the genetics of schizophrenia, bipolar disorder and obsessive-compulsive disorder. The lab is leveraging high throughput sequencing technology to look for genetic factors that have an etiological role in psychiatric illness. With this knowledge, they aim to improve diagnostic methods and possibly develop therapies to improve the quality of life for that population.
Read More
The Knowles lab and USC’s HPCC team worked with DDN to identify a high-performance, scalable, cost-effective solution. USC selected a solution based on DDN’s Storage Fusion Architecture®, running an embedded image of DDN GRIDScaler parallel file system on the SFA10K-E. This solution appeared to meet both parties’ needs, supporting a high performance parallel file system, and NFS, simultaneously. Additional research groups with large amounts of data at the Keck School of Medicine learned about the impending storage deployment. They added their resources to the purchase of GRIDScaler, which then doubled the raw system capacity to over 1PB. GRIDScaler is a large, high-performance storage appliance with a shared architecture - capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster. This provided a clear advantage over the use of smaller, lower performance units, which would also entail more HPCC management. The storage array is connected to the computer resources via multiple 10Gb-E. There’s a separate 10Gb-E connection to the head node that runs an image of the GRIDScaler client software and acts as an NFS server. The caching server in the Knowles Lab provides the long-haul connection to the head node in the HPCC and acts as a data transfer gateway for the Windows/Linux terminals and instruments located in the Knowles Lab.
Read More
The solution simultaneously supports a high performance parallel file system and NFS.
GRIDScaler is a large, high-performance storage appliance with a shared architecture- capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster.
GRIDScaler provides performance, economy and scale.
The new schedule will generate several terabytes of raw data per day that needs to be transferred, inspected and aligned to the human genome.
The storage array is connected to the computer resources via multiple 10Gb-E.
GRIDScaler is a large, high-performance storage appliance with a shared architecture - capable of delivering in excess of 800 million IOPS continuously to the USC HPCC cluster.
Download PDF Version
test test