Denodo Technologies Case Studies Curing Advanced Data Ailments Using Data Virtualization to Aid Worldwide War on Cancer
Edit This Case Study Record
Denodo Technologies Logo

Curing Advanced Data Ailments Using Data Virtualization to Aid Worldwide War on Cancer

Denodo Technologies
Application Infrastructure & Middleware - Data Exchange & Integration
Platform as a Service (PaaS) - Data Management Platforms
Healthcare & Hospitals
Life Sciences
Product Research & Development
Quality Assurance
Predictive Quality Analytics
Data Science Services
System Integration
The National Institutes of Health (NIH) faced significant obstacles in reliably and efficiently moving large volumes of cancer genome data from The Cancer Genome Atlas (TCGA) to the International Cancer Genome Consortium (ICGC). This process involved transforming the TCGA data to meet ICGC format requirements and then periodically uploading the data into ICGC servers. The transformation was initially accomplished using PERL scripts, but NIH faced challenges with this process. It was not scalable, had high costs, and was inaccurate due to limited connectivity to data sources leading to redundant copies of data, slower processes and greater chance of errors.
Read More
The National Institutes of Health (NIH) is the nation’s medical research agency and a component of the U.S. Department of Health and Human Services. It includes 27 Institutes and Centers and is the primary federal agency conducting and supporting basic, clinical, and translational medical research. NIH investigates the causes, treatments, and cures for both common and rare diseases. Two of the 27 institutes that make up NIH are The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), which recently joined forces to execute on a project known as The Cancer Genome Atlas (TCGA). The TCGA mission is to catalog the genetic mutations responsible for cancer using genome sequencing and bioinformatics.
Read More
The NIH used data virtualization to connect to the different sources of the genome data, apply transformations, produce the final data sets and periodically upload these data sets into the ICGC servers. The connectors within the data virtualization platform provided a normalized view of the patient and donor data stored in XML files, sample test results in Oracle and TCGA-ICGC mapping data in MySQL DB. The transformation process included three important steps: aggregating the patient and test data, converting this data into the ICGC format using the mapping information, and then creating the final output files in CSV format. Lastly, the scheduler within the data virtualization platform executed an FTP process once every quarter and then uploaded the files into the ICGC servers.
Read More
Increased scalability: Include larger genome data sets due to the creation of replicable generic workflows and the platform's advanced performance capabilities.
Increased efficiency: Faster development and modification of TCGA - ICGC transformation processes because of the platform's diverse connectivity and publishing capabilities.
Increased accuracy: Minimized replication and manual intervention led to the most current versions of data and processes being used to create the output files, leading to greater accuracy in the final data.
Download PDF Version
test test