Data Sharing and Publishing in Variant Analysis

Data sharing is a critical component of any research project and something that we’ve thought a lot about at Ingenuity. What is often an overlooked feature of analysis tools has inspired us to create two must-have features: Share and Publish. You can see both of these tools in action in some recent and compelling papers, which we describe below.

Among the many benefits of our Share and Publish features: they are built directly into Variant Analysis and can be performed seamlessly with just a few mouse clicks. While the goal for both is to facilitate collaboration and data sharing, there are a few differences between them.

Sharing human genomic data with another user within Variant Analysis is equivalent to sharing data with someone via email, FTP, hard drives, or other standard means of data transfer. Once you share your data, the people you’ve shared it with have a copy of it, and they can use it and further share that copy at their discretion. They can also update the analysis and share it back with you. If they don’t have an account, they’ll have an opportunity to set up a secure account at no charge when they receive your shared results.

You can also publish an analysis through a custom URL if you want your analyzed Variant Analysis data to be included as an online supplement for an article. Like sharing, people wishing to view the analysis can access it via a free account. The process of publishing your analysis requires a simple click of the Publish button. This first step allows you to embargo the custom URL for your analysis, so only you and individuals you specify can access it. When your manuscript is accepted and published, you can update the final title and journal and click “release.” Releasing your data set gives Ingenuity permission to make your data publicly and perpetually accessible via the custom URL you create. The analysis parameters used in your published study will persist with the data set, meaning each time it is accessed, the users will see how you analyzed it. They can modify the analysis parameters but not overwrite your protocol.

Great examples of how the Publish feature works can be seen in three recent papers by Variant Analysis customers, including one from this week’s publication of the NCI-60 data set.

Cancer ResearchResearchers from the National Cancer Institute published data resulting from the sequencing of the protein-coding portions of the NCI-60 human cancer cell line genomes. As described in their Cancer Research publication, the researchers used whole-genome sequence data to generate a catalog of some 60,000 mutations thought to be cancer-related.

To demonstrate the potential value of these data, the researchers also used the Super Learner algorithm to predict the sensitivity of cells harboring type II variants to 103 anti-cancer drugs approved by the FDA and an additional 207 investigational new drugs. They were able to study the correlations between key cancer-related genes and clinically relevant anti-cancer drugs, and predict the outcome.

To make these data broadly available to the global community of cancer researchers, the study’s authors have made them publicly accessible through the CellMiner database, NCI’s Developmental Therapeutics Program, and Ingenuity. If you don’t already have a free Ingenuity account, you’ll be asked to create one; the process takes less than a minute.

Yves Pommier, M.D., Ph.D., chief of the Laboratory of Molecular Pharmacology at the NCI in Bethesda, Md., said in a statement, “Opening this extensive data set to researchers will expand our knowledge and understanding of tumorigenesis as more and more cancer-related gene aberrations are discovered. This comes at a great time, because genomic medicine is becoming a reality, and I am very hopeful this valuable information will change the way we use drugs for precision medicine.”

AJMGIn another recent paper, Hugh Rienhoff, MD, founder of myDaughtersDNA.com, and collaborators published results from a study that identified a mutation they are confident is responsible for his daughter’s undefined syndrome. By partial-genome sequencing of the entire Rienhoff family, this group of collaborators identified a mutation in the gene that encodes the transforming growth factor-β3 (TGF-β3). This mutation has not been previously linked to any disease and is a likely culprit for his daughter’s condition, which includes hypertelorism (broad spacing between the eyes) and bifid uvula (a cleft in the tissue at the back of the palate).

The paper, which was published in the American Journal of Medical Genetics, provides a live custom link to the data hosted by Ingenuity. You can view this supplementary data at: https://variants.ingenuity.com/Rienhoff2013.

GenomeResearchAnother example comes from the largest hepatocellular carcinoma (HCC) genome sequencing study to date. In a paper published online in Genome Research, the collaborators not only present a comprehensive genetic landscape of HCC from the Asian Cancer Research Group, but they reveal a number of new insights of disease biology such as activation of the JAK/STAT pathway. The raw sequence data are available in public databases and have also been published via Ingenuity so that they can be analyzed interactively by anyone without the need for any additional bioinformatics tools. You can access them here: www.ingenuity.com/acrg2012.

To learn more about how to share or publish an analysis, please watch this short video demonstration.