Machine Learning in Drug Discovery

  1. We don’t understand the main mechanism of the disease (the pathways involved in misfolded amyloid accumulation). That means that when we choose a target, we’re not entirely sure if it is ‘druggable’ — that is — whether it will have the effect that we intend it to have.
  2. We don’t entirely understand the relationship between small molecule structure and physiological properties and side effects.
There are three main stages of drug discovery. Stage A involves identifying and validating the chosen target. Stage 2 involves narrowing down leads and optimizing them based on molecular properties. Finally, Stage C involves clinical trials to check for pharmacodynamic and pharmacokinetic side effects.

Navigating Chemical Space

Drug Screening Pipeline
  • Check out this article to learn more about why drugs take two decades to make
  • RoBERTa HuggingFace pre-trained model — check out this pretrained transformer model on the ZINC small molecule database that one of my friends made

Evaluating The Efficacy of Drugs

Companies Changing the Status Quo

OneThreeBio is revolutionizing the way that small molecule properties are predicted.

Protein Representations

Protein and RNA biologic discovery with machine learning

Navigating evolutionary landscapes with machine learning

  • Gaussian process models — these models are characterized by a prior distribution and a posterior distribution. The models are able to fit a given dataset with input features and labels to the distribution and model uncertainty in the feature space. The parameters of a gaussian process model consist of the covariance matrix (aka kernel) and the mean vector. These show the values of the input variables as well as how far away they are from each other in the feature space.
  • Variational autoencoders — these models are used as a form of representation learning, where the model compresses a given sequence or structure into a latent space, which contains variables, each of which represents a linear combination of the original features. These latent variables are often more conceptually valuable than the original variables themselves.
  • Repository of DNA and RNA binding trained models for genomics — Kipoi
  • RNA splicing classification GitHub code in Keras (convolutional neural network (CNN))

Using ML to Understand Biological Mechanisms

ML in Biophysics

Company Case Study — Atomwise


Undruggable Targets

Daphne Koller — “Many (perhaps most) of the “low-hanging fruit” — druggable targets that have a significant effect on a large population — have been discovered. If so, then the next phase of drug development will need to focus on drugs that are more specialized — whose effects may be context-specific, and which apply only to a subset of patients.”

  1. Function through protein-protein interactions
  2. Are intrinsically disordered (they don’t have a rigid static structure) (>80% of proteins fall into this category)
  • Delphi — PPI prediction with solely sequence information


  • Deep Learning in the life sciences — Bharath Ramsundar
  • Deep Medicine: How Artificial Intelligence Can Make Healthcare Human — Eric Topol

Key Takeaways

  • Machine learning is helping us understand the relationship between structure and function, both for therapeutics and small molecules themselves — mainly through computing abstract representations.
  • Navigating through physicochemical space manually is time and cost intensive — ML and a new revolution in active learning is allowing us to actually have ML help us with designing the experiments themselves in order to minimize the time and money spent.
  • Biological systems are complex ones, which dynamic relationships between nodes and interactions. While we cannot possibly know and measure every variable, ML allows us to extract what’s important.




Innovator passionate about the intersection between structural biology, machine learning, and chemiinformatics. Currently @ 99andbeyond.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

DALEX 2.1.0 is live on GitHub!

Let’s teach a machine to recognize facial expressions using CNN

New to Neural Networks?

Time-Aware Machine Learning Algorithms

Natural Language Processing

Intro to Machine Learning

LightOn at #NeurIPS2020

Top 7 Machine Learning resources I wish I knew earlier

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mukundh Murthy

Mukundh Murthy

Innovator passionate about the intersection between structural biology, machine learning, and chemiinformatics. Currently @ 99andbeyond.

More from Medium

Key Insights from the 2020 Stack Overflow Survey.

CS373 Spring 2022: Malithy Wimalasooriya: Final Entry

Everything About Neural Network Activation Functions Is Here

Data Analysis on FIFA international world cup data