
A recent MIT report highlighted a startling statistic.
95% of investments in Generative AI have produced zero returns.
This number is alarming, but for those of us in drug discovery and pharmacovigilance, it shouldn’t be surprising.
The problem isn’t the technology itself. The problem is how we measure it.
Most AI tools are deployed without being rigorously evaluated. The prevailing logic seems to be: “It looks like it works, so let’s ship it.”
But in biomedical research, “looks like it works” is not a strategy. It is a liability.
At Insilicom, we take a different approach. We believe in testing our methods in open, international challenges to ensure they are tested fairly.
Why? Because even our own team can make mistakes.
It is incredibly easy to generate models that seem powerful in-house but fail to generalize in the real world, often due to subtle information leak issues that internal teams might miss.
By participating in peer-organized challenges, we force our methods to prove themselves.
For example, in the BioCreative Challenge VIII, we trained and cross-validated our models on 600 PubMed abstracts. We were then tested by the organizers on 400 abstracts our models had never seen before.
We performed well, not because we got lucky, but because we knew exactly how the model would behave on unseen data.
Such rigorous tests and evaluations are lacking for most of the AI models deployed today. If these models were evaluated this rigorously, many would never have been deployed at all, and we wouldn’t be seeing a 95% failure rate.
To get ROI from AI, you must move beyond the black box. You need external validation and verifiable benchmarks.
I’d love to hear from my network. How does your organization validate the AI tools you are piloting?
#ArtificialIntelligence #DrugDiscovery #Pharmacovigilance #Biotech #DataScience
