Machine Learning in Drug Discovery

  1. We don’t understand the main mechanism of the disease (the pathways involved in misfolded amyloid accumulation). That means that when we choose a target, we’re not entirely sure if it is ‘druggable’ — that is — whether it will have the effect that we intend it to have.
  2. We don’t entirely understand the relationship between small molecule structure and physiological properties and side effects.
There are three main stages of drug discovery. Stage A involves identifying and validating the chosen target. Stage 2 involves narrowing down leads and optimizing them based on molecular properties. Finally, Stage C involves clinical trials to check for pharmacodynamic and pharmacokinetic side effects.

Navigating Chemical Space

The entire space of small molecules is made up of more than 10⁶³ molecules, this space is narrowed down into 10²² and eventually 10¹⁴ molecules.

Drug Screening Pipeline
  • Check out this article to learn more about why drugs take two decades to make
  • RoBERTa HuggingFace pre-trained model — check out this pretrained transformer model on the ZINC small molecule database that one of my friends made

Evaluating The Efficacy of Drugs

How is the efficacy of a drug evaluated? This is most often done through free energy of binding. However, this approach often leads to side effects discovered later on in clinical trials. Side effects are often due to the pharmacokinetic and pharmacodynamic effects of drugs, which stem from the differential metabolism and binding efficacy of drugs from person to person based on mutations. However, solely measuring drugs based on physical properties such as logP and outdated Lipinski's rule of 5 leads to unknown toxicities and side effects.

Companies Changing the Status Quo

OneThreeBio is revolutionizing the way that small molecule properties are predicted.

Protein Representations

As you saw in the beginning, target acquisition and validation are the first steps in drug discovery. If the small molecule is targeting a protein-protein interaction, then multiple protein surfaces must be characterized in order to design an efficacious small molecule.

Protein and RNA biologic discovery with machine learning

The astronomical number of possible sequences with just a few nucleotides shows how hard it is to navigate the fitness landscape. Machine learning algorithms that allow us to more quickly traverse the fitness landscape will revolutionize the way that we think about not only protein and RNA biologic drug discovery, but also regular small molecule drug discovery.

Navigating evolutionary landscapes with machine learning

  • Gaussian process models — these models are characterized by a prior distribution and a posterior distribution. The models are able to fit a given dataset with input features and labels to the distribution and model uncertainty in the feature space. The parameters of a gaussian process model consist of the covariance matrix (aka kernel) and the mean vector. These show the values of the input variables as well as how far away they are from each other in the feature space.
  • Variational autoencoders — these models are used as a form of representation learning, where the model compresses a given sequence or structure into a latent space, which contains variables, each of which represents a linear combination of the original features. These latent variables are often more conceptually valuable than the original variables themselves.
  • Repository of DNA and RNA binding trained models for genomics — Kipoi
  • RNA splicing classification GitHub code in Keras (convolutional neural network (CNN))

Using ML to Understand Biological Mechanisms

The main way that we can begin to design more efficacious drugs

ML in Biophysics

Understanding biological mechanisms mean understanding the detailed and complex interactions that take place between the macromolecules in our cells.

Company Case Study — Atomwise


Undruggable Targets

Discovering a drug isn’t as simple as choosing an important protein participating in a vital signal transduction cascade — that’s what makes diseases like cancer so hard to treat. Instead, many targets have become known as ‘undruggable’ — this is the next main challenge that we must face in drug discovery.

Daphne Koller — “Many (perhaps most) of the “low-hanging fruit” — druggable targets that have a significant effect on a large population — have been discovered. If so, then the next phase of drug development will need to focus on drugs that are more specialized — whose effects may be context-specific, and which apply only to a subset of patients.”

KRAS and MYC are probably two of the most notorious undruggable targets in all of the biomedicine — they just might be the solution to cancer.

  1. Function through protein-protein interactions
  2. Are intrinsically disordered (they don’t have a rigid static structure) (>80% of proteins fall into this category)
  • Delphi — PPI prediction with solely sequence information


Ultimately, deep learning will revolutionize the way that we think about medicine — from helping us understand our connectome to help us gain a more thorough understanding of biological pathways and mechanisms.

  • Deep Learning in the life sciences — Bharath Ramsundar
  • Deep Medicine: How Artificial Intelligence Can Make Healthcare Human — Eric Topol

Key Takeaways

  • Machine learning is helping us understand the relationship between structure and function, both for therapeutics and small molecules themselves — mainly through computing abstract representations.
  • Navigating through physicochemical space manually is time and cost intensive — ML and a new revolution in active learning is allowing us to actually have ML help us with designing the experiments themselves in order to minimize the time and money spent.
  • Biological systems are complex ones, which dynamic relationships between nodes and interactions. While we cannot possibly know and measure every variable, ML allows us to extract what’s important.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mukundh Murthy

Mukundh Murthy

Innovator passionate about the intersection between structural biology, machine learning, and chemiinformatics. Currently @ 99andbeyond.