Most biomedical knowledge graphs today are built from public databases, but few realize how incomplete these sources are compared to what’s published in PubMed literature.
We recently compared our AI-extracted knowledge graph IKraph with 40+ public databases.
The results were eye-opening
📊 Figure 1:
Across five major relation types — gene–gene, chemical–chemical, gene–disease, chemical–disease, and chemical–gene —
IKraph captured millions more relationships than those present in all public databases combined.
📈 Figure 2:
When compared with specialized disease–mutation databases like CIViC, ClinPGx, and OncoKB,
IKraph still contained:
• 10–100× more variants
• More diseases, genes and drugs connected to those variants
• Far more related articles
These numbers tell a simple story:
➡️ Most biomedical knowledge in literature never makes it into public databases.
➡️ AI-driven literature extraction can uncover this hidden knowledge at scale.
This gap matters.
Knowledge graphs built only from public databases miss critical biomedical insights — limiting what we can achieve in drug repurposing, biomarker discovery, and hypothesis generation.
In the coming posts, I’ll share examples showing how a comprehensive knowledge graph like IKraph can transform drug repurposing outcomes. Stay tuned!
________________________________________
💡 What’s your experience with using public databases vs. literature-derived data?
Do you think AI can finally close this knowledge gap?
#AI #KnowledgeGraph #DrugDiscovery #Bioinformatics #Pharmacovigilance #DrugRepurposing

Figure 1: Literature vs. Public Databases on Relation Coverage. IKraph extracted far more biomedical relations from PubMed, including chemical–gene, gene–gene, and gene–disease connections, than those found across 40+ public databases. 👉 Most published relationships never make it into structured public datasets.

Figure 2: IKraph vs. Variant Databases Compared with CIViC, ClinPGx, and OncoKB. IKraph contains orders of magnitude more data: more diseases, variants, genes, drugs, and supporting articles. 👉 AI-powered extraction from literature dramatically expands biomedical knowledge coverage.
