One of the toughest challenges in automated drug repurposing and hypothesis generation is validation.
Most AI models are tested on data that already include the answers, which often leads to serious information leakage. The model looks good on paper, but fails in real-world use.
To solve this, we designed what we now call a time-aware validation approach.
Here’s how it works:
We select multiple time cutoffs and use only the knowledge published before each cutoff to make predictions. Then, we use the literature and clinical trial data published after that cutoff to validate the predictions.
This mimics a pseudo real-time experiment, allowing us to test how well our model would have performed if it had been running years ago.
For example, in our COVID-19 drug repurposing study, we used publications available up to April 2020 to make predictions and then monitored their validation every month through PubMed and clinical trials databases.
About one-third of the drugs we predicted as candidates were later supported by independent studies or trials, a remarkable result given that these predictions were made in real time.
The remaining two-thirds may still contain undiscovered candidates as they await future validation.
We applied the same time-aware approach to cystic fibrosis drug repurposing, starting from the 1980s and validating each year’s predictions using only information published later. This historical backtesting provided a realistic estimate of how predictive our model truly is.
In our view, every researcher developing AI-based drug repurposing or automated hypothesis generation methods should adopt time-aware validation.
Without it, many reported successes risk being artifacts of information leakage, performing well only because the data already contain the answers.
This, in my opinion, is one of the biggest issues facing current AI models in biomedical research. Real progress will come only when we validate predictions as if we truly lived in that past moment.
#AI #DrugDiscovery #KnowledgeGraph #MachineLearning #DrugRepurposing #BiomedicalAI #Validation

