The world’s largest set of data on human genetic variation — produced by the international 1000 Genomes Project — is now publicly available on the Amazon Web Services (AWS) cloud, the National Institutes of Health and AWS jointly announced today.
Initiated in 2008, the 1000 Genomes Project is an international public-private consortium that aims to build the most detailed map of human genetic variation available, ultimately with data from the genomes of more than 2,600 people from 26 populations around the world. The project began with three pilot studies that assessed strategies for producing a catalog of genetic variants that are present at 1 percent or greater in the populations studied. Data from the pilot studies were released on AWS in 2010. The data now being released in the cloud include results from sequencing the DNA of some 1,700 people; the remaining 900 samples will be sequenced in 2012 and that data will be released to researchers as soon as possible. The new results identify genetic variation occurring in less than 1 percent of the study populations and which may make important genetic contributions to common diseases, such as cancer or diabetes.
It took more than 10 years and billions of dollars to sequence the first human genome. Recent advances in genome sequencing technology have enabled researchers to tackle studies like the 1000 Genomes Project by collecting far more data faster. This has created a growing need for powerful and instantly available technology infrastructure to analyze that data,” said Deepak Singh, Ph.D., principal product manager, Amazon Web Services. “We’re excited to help scientists gain access to this important data set by making it available to anyone with access to the Internet. This means researchers and labs of all sizes and budgets have access to the complete 1,000 Genomes Project data and can immediately start analyzing and crunching the data without the investment it would normally require in hardware, facilities and personnel. Researchers can focus on advancing science, not obtaining the resources required for their research.”
The 1000 Genomes Project welcomes working with other cloud computing providers who are interested in hosting the data. Cloud access to the 1000 Genomes Project data through AWS is at http://s3.amazonaws.com/1000genomes/.