Acute myeloid leukaemia (AML) kills hundreds of children a year. It's the type of cancer that causes the most deaths in children under two, and in teenagers. It has a poor prognosis, and its treatments can be severely toxic.
Research initiative Target Paediatric AML (tpAML) was set up to change the way that the disease is diagnosed, monitored and treated, through greater use of personalised medicine. Rather than the current one-size-fits-all approach for many diseases, personalised medicine aims to tailor an individual's treatment by looking at their unique circumstance, needs, health, and genetics.
AML is caused by many different types of genetic mutation, alone and together. Those differences can affect how the cancer should be treated and its prognosis. To understand better how to find, track and treat the condition, tpAML researchers began building the largest dataset ever compiled around the disease. By sequencing the genomes of over 2,000 people, both alive and deceased, who had the disease, tpAML's researchers hoped to find previously unknown links between certain mutations and how a cancer could be tackled.
SEE: Big data management tips (free PDF)[1] (TechRepublic)
Genomic data is notoriously sizeable, and tpAML's sequencing had generated over a petabyte of it. As well as difficulties thrown up by the sheer bulk of data to be analysed, tpAML's data was also hugely complex: each patient's data had 48,000 linked RNA transcripts to analyse.
Earlier this year, Joe Depa, a father who had lost a daughter to the disease and was working with tpAML, joined with his coworkers at Accenture to work on a project to build a system that could analyse the imposing dataset.
Linking up with tpAML's affiliated data scientists and computational working group, Depa along with data-scientist and genomic-expert colleagues hoped to help turn the data