In this blog series, we’ve been taking a closer look at the Ingenuity Knowledge Base. Our other posts described the manual curation process we use to ensure the highest quality of information in our database and the ExpertAssist Findings that add depth of content. Today, we look at how content from many disparate sources is integrated to make information computable across The Knowledge Base so it can be used to power Ingenuity Variant Analysis, Ingenuity Pathway Analysis, and our new Ingenuity Clinical tool.
The Ingenuity Knowledge Base pulls content from dozens of different sources, from publicly accessible scientific and clinical databases to peer-reviewed journals and more. On the database front, this includes findings and annotations from major NCBI databases (EntrezGene, RefSeq, OMIM disease associations), targets and pharmacological relevance of FDA-approved and clinical trial drugs, clinical biomarkers, Gene Ontology annotations, a normal gene expression body atlas for more than 30 tissues and the NCI-60 panel of cancer cell lines, microRNA-mRNA target databases and GWAS databases. For journals, we curate information from nearly 4,000 scientific publications. And now with the acquisition of BIOBASE, we are expanding coverage to include HGMD, PGMD and others.
But of course it would be of limited utility if we just aggregated this information in its original state. So our team developed QIAGEN’s Ingenuity Ontology, a framework for organizing and describing biological evidence that allows users to ask questions across all of these data sources and get a coherent answer back. Other taxonomies in the scientific realm tend to be isolated, but our ontology offers a way to integrate all of the content together with consistent terms and references. That careful structure lets us add information all the time without having to reclassify existing data. The idea was simple: there’s a lot of insight that can be extracted from a very large, horizontally and vertically integrated the Knowledge Base.
Indeed, it turns out that users of Ingenuity products can perform remarkable modeling processes and can get a clearer view of the relationship between the wholes and the parts in biological relationships when information is well integrated. Pulling together, for instance, separate information sources about variants and about disease allows for the discovery of many connections that may not otherwise be obvious. That provides for more predictive analysis and more comprehensive views of biological network behavior.
Also, because we store everything in a very consistent way, the data is useful both inside and outside of the Ingenuity Knowledge Base. We support public identifier sets such as RefSeq IDs, microarray chip numbers and genetic coordinates that allow users to import their own data and link it to information from external sources. That keeps our content interoperable and really maximizes value for users.
Content integration isn’t just about bringing in massive amounts of data and formatting it consistently. Our framework also filters content, testing for structural integrity and other factors. That lets us alert a public database when we find an error — a handy way to give back to the community — or flag scientific content that doesn’t seem likely. This helps to make sure the original source gets carefully reviewed. Further, the ontology provides consistent layers of abstraction and extensive use of synonyms to map terms across the different resources. This framework supports easy traversal across silo’d use of scientific terms across domains, journals and researchers as well as supports the natural evolution of scientific understanding and terminology over time.
In our next Ingenuity Knowledge Base blog post, we’ll take a look at the bigger picture and how the Knowledge Base uniquely powers advanced algorithms that exponentially speed new scientific discoveries in ways no other web-based bioinformatics system has in the past.