How much knowledge are we missing when we rely only on public databases?

Most biomedical knowledge graphs today are built from public databases, but few realize how incomplete these sources are compared to what’s published in PubMed literature.

We recently compared our AI-extracted knowledge graph IKraph with 40+ public databases.

The results were eye-opening

📊 Figure 1:

Across five major relation types — gene–gene, chemical–chemical, gene–disease, chemical–disease, and chemical–gene —

IKraph captured millions more relationships than those present in all public databases combined.

📈 Figure 2:

When compared with specialized disease–mutation databases like CIViC, ClinPGx, and OncoKB,

IKraph still contained:

• 10–100× more variants

• More diseases, genes and drugs connected to those variants

• Far more related articles

These numbers tell a simple story:

➡️ Most biomedical knowledge in literature never makes it into public databases.

➡️ AI-driven literature extraction can uncover this hidden knowledge at scale.

This gap matters.

Knowledge graphs built only from public databases miss critical biomedical insights — limiting what we can achieve in drug repurposing, biomarker discovery, and hypothesis generation.

In the coming posts, I’ll share examples showing how a comprehensive knowledge graph like IKraph can transform drug repurposing outcomes. Stay tuned!

________________________________________

💡 What’s your experience with using public databases vs. literature-derived data?

Do you think AI can finally close this knowledge gap?

#AI #KnowledgeGraph #DrugDiscovery #Bioinformatics #Pharmacovigilance #DrugRepurposing

Figure 1: Literature vs. Public Databases on Relation Coverage. IKraph extracted far more biomedical relations from PubMed, including chemical–gene, gene–gene, and gene–disease connections, than those found across 40+ public databases. 👉 Most published relationships never make it into structured public datasets.

Figure 2: IKraph vs. Variant Databases Compared with CIViC, ClinPGx, and OncoKB. IKraph contains orders of magnitude more data: more diseases, variants, genes, drugs, and supporting articles. 👉 AI-powered extraction from literature dramatically expands biomedical knowledge coverage.

Leave a Reply