The recent explosion of genomics technology has revolutionized biology, but it is only really of use if people are able to analyze and use the resulting sequences. Storage of such vast quantities of data is problematic, as the ongoing uncertainty over the future of NCBI’s arm of the Sequence Read Archive shows (SRA).
The BGI, in conjunction with BioMed Central, recently launched GigaScience, a journal aimed specifically at projects generating a lot of data, which can accommodate such large datasets alongside the articles describing them. GigaScience also anticipates becoming a repository for stand-alone datasets such as those resulting from genome sequencing projects. One such dataset has just been released, and it contains the assembled and annotated sequences of genomes from three strains of sorghum, a plant of huge economic importance in the developing world as a source of food, fodder, fuel and fiber. The article describing these data has been published inGenome Biology; the raw reads are available from the SRA, and the assembled reads from GigaScience. This is the first time that a dataset has been cited as a DoI in an article’s reference list, so is the first step in the process leading to researchers getting citation credits for the data they generate.