Content Sources Powering the Ingenuity Knowledge Base Part I: Major NCBI Databases


The Ingenuity Knowledge Base that powers all of the QIAGEN Ingenuity web applications incorporates data from a large number of sources. We dedicate a lot of attention to the high-quality, manually curated content from published literature. But our investments don’t stop there because scientific knowledge doesn’t stop there, either.  The Knowledge Base is a nexus for structuring, integrating, and making almost any type of biomedical content computable to help biomedical researchers and clinicians understand and interpret the biological meaning of their data.

In this blog series, we’ll take a look at how we go beyond manually curated content by integrating it with data from public and privately funded databases. Hopefully this will provide a better sense of the scope and utility of what’s inside the Knowledge Base.

As you may recall from this recent blog post, the way we structure and integrate all this content is QIAGEN’s Ingenuity Ontology. It’s a framework we use for organizing and describing biological evidence and is what allows us to integrate data from disparate sources, enabling users to ask questions across all of these data sources and get coherent answers and predictive hypotheses. While other scientific taxonomies tend to be isolated, our ontology offers a way to integrate all of the content with consistent terms and references. That careful structure lets us add new information and keep existing information up-to-date all the time without having to reclassify existing data. The idea was simple: there’s a lot of insight that can be extracted from a very large, horizontally and vertically integrated knowledge base, so that is what we built. Now let’s see how we feed it.

We begin with the major databases hosted by the National Center for Biotechnology Information, as these are often the first stop for scientists looking to put their experimental data in context. NCBI is one of the most trusted sources of information in genomics, and with good reason — their experts do a remarkable job of building, curating, and maintaining top-notch repositories.

Three of the NCBI databases we integrate directly into Knowledge Base are EntrezGene, RefSeq, and OMIM. As NCBI describes it, EntrezGene (sometimes just called Gene) “supplies gene-specific connections in the nexus of map, sequence, expression, structure, function, citation, and homology data.” Unique gene identifiers are used across NCBI databases to make cross-database use more efficient.

RefSeq — its official name is the Reference Sequence collection — includes annotated sequences related to DNA, RNA, and proteins. The NCBI Handbook describes it this way: “Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information.”

OMIM, short for Online Mendelian Inheritance in Man, is frequently used when studying disease associations. Hosted by NCBI, it is built on a collection originally published by Victor McKusick at Johns Hopkins and is still curated by scientists at the college’s School of Medicine. It is frequently updated with genetic disorders and traits and aims to connect genetic variation with its correlated phenotype.

Scientists who only know Ingenuity applications for gene- and pathway-centric content might be surprised at the vast amount of genetic sequences, locus-specific details, and structural genetic information fully integrated through the ontology. This includes connections between genetic sequence variation, phenotypic, pathway, and network information that is unavailable as an integrated resource anywhere else — and it’s part of what makes the Knowledge Base unique.

Throughout this series, we’ll be looking at many types of databases, including clinical databases, FDA information, cancer-specific data, and more. For your reference, here is an index of data sources. For a handy graphical representation of Knowledge Base click here. Check back soon for our next database snapshot.