In this blog series, we’ve been taking a closer look at the Ingenuity Knowledge Base. Our first post covered the manual curation process that we have been using since the database was first built. Today, we look at a more recent program we have put in place to add considerable depth of content to the Knowledge Base engine.
ExpertAssist Findings, launched in 2011, are manually reviewed, automatically extracted findings from the abstracts of a broad range of recently published biomedical journals. We modeled the extraction protocol on the Expert Findings process used for our manual curation. Information that comes in through this avenue is manually reviewed for correct mapping and extraction before being imported into the Knowledge Base. That way we maintain the highest quality, have proper synonym resolution, and capture both contextual details and broad functional relationships — all while ensuring the information is computationally accessible.
These findings are updated weekly from about 3,600 scientific publications. (Our Expert Findings manual curation covers the top 300 journals.) This helps keep the Knowledge Base up-to-date with the scientific literature and also broadens the types of content we are able to pull into the information engine.
The other way we provide great depth of content is to pull in more information and curate more biological relationships than any other database. Whether it’s through Expert Findings or ExpertAssist Findings, data pulled into the Knowledge Base is fully contextualized. For example, when integrating a paper about a particular disease, the Knowledge Base will store relevant details from that paper: species, cell gender, cell activation status, the family relationship of patients in the study, zygosity of all subjects, was a mutation benign or malignant, missense or nonsense, and much more.
More recently we have been making significant investments in expanding our coverage of human genetic variation as mapped to disease phenotype and RNA isoform related content. These coverage investments are driven by the needs of researchers trying to understand and interpret data from NGS technologies, and clinical labs interpreting new sequence based tests. The addition of content resources from BIOBASE including HGMD, PGMD and others has significantly expanded our hereditary disease and pharmacogenomic coverage, respectively. When you combine this with new user driven just in time bibliography (JIT-B) support which is under development as part of Ingenuity Clinical, we expect the Ingenuity Knowledge Base will easily maintain it’s gold standard status as the most comprehensive, high-quality, computable, and up-to-date source for biomedical literature as long as researchers find it valuable.
The mantra behind the Ingenuity Knowledge Base has always been that pulling together as much information as possible in the most consistent and high-quality way will be the best way to help scientists answer any kind of question they might have about their experimental results. Our team is constantly looking for ways to add even more to the Knowledge Base.
In our next Ingenuity Knowledge Base blog post, we put together the pieces we’ve talked about with a look at the content integration process that is crucial to keeping all of this information interoperable.