How Metabolomics Will Lead the Way for Preventative Medicine

Mukundh Murthy
8 min readSep 21, 2019


You’ve probably gone to the doctor and had a blood test. The blood test gives you your A1C (your average blood glucose). Or maybe you’ve gotten a urine test done that measures your creatinine levels. Why do doctor’s do these tests? They try to measure levels of certain biomarkers to diagnose whether you have diabetes or your kidneys aren’t functioning properly. But, biological systems are way more complex. There isn’t just one marker in our blood or urine: there are thousands of biomarkers (aka metabolites) that interact with each other to form extremely complex networks. Below is a figure showing almost all human biochemical pathways.

I know how you’re feeling right now… . 😱. It’s probably the most condensed graphic you’ve ever seen in your life. In high school, most students learn about cellular respiration (glycolysis, Krebs cycle, etc.). Beyond that, we don’t really learn about what metabolism means. It creates a false illusion about the simplicity of metabolism and cell biochemistry (It’s ok though, you don’t have to memorize all the biochemical pathways in this image to understand the rest of this article 😉).

But how can such complexity give rise to new innovation and discovery?

Ok, let’s take a step back. Remember the familiar situation most people face at the beginning of this article? In that situation, the doctor can tell you about diabetes and any kidney diseases and provide a diagnosis after the fact. The doctor might prescribe some medicine, but there is one thing you can’t avoid acknowledging: You are now stuck with this disease. You can’t go back in time to prevent whatever lead to the disease. You are stuck with it, period.

But what if I told you that there might come a day when doctors might be able to do a simple blood test and tell you not what conditions you have, but how you can prevent certain diseases with the correct action. This day can become a reality, thanks to the thousands of unnoticed metabolites lurking in our system.

What are metabolites?

Metabolites are small organic molecules that regulate all aspects of a cell’s life. They regulate cell division and cell nutrition as well other important life processes.

Thus, while proteomics, genomics, and transcriptomics have gained most of the attention over the past few years, metabolites are what it comes down to when analyzing most diseases. It all comes down to the central dogma: genes code for RNAs, which code for proteins. But it doesn’t end there. Proteins modify metabolites and produce new metabolites. Therefore, while analyzing DNAs, RNAs, and proteins can be very useful, analyzing metabolomics directly provides information that can be crucial to curing disease.

Let me explain using an analogy from my 9th grade world history class: When trade used to happen hundreds of years ago, there would be three main groups involved starting from the production of an item to selling: the manufacturers, middlemen, and vendors. The middlemen would add taxes and manipulate the prices of the product. Thus the prices charged by the vendors for a product in no way reflected the original prices due to the middlemen. RNAs and proteins are like these “middlemen.” It is difficult to deduce the small molecules and their concentrations solely by looking at the DNA, RNA, and proteins. This is how metabolomics adds a new perspective.

The central dogma of biology is often taught and presented as shown above (DNA to RNA to Protein). However, it is never extended towards metabolites.

The Main Challenge: Identifying and Analyzing Metabolite Networks.

Analyzing the presence of certain metabolite concentrations and comparing to other metabolite concentrations can provide insightful information regarding underlying pathology. But there is no easy way to do this.

First of all, not all metabolites have a confirmed chemical structure.

This is a problem, as existing chemical classification techniques such as NMR (nuclear magnetic resonance) and mass spectroscopy won’t be able to identify metabolites based off of their structures. However, this problem is being resolved through the creation of large databases such as the Human Metabolome Database. It now stores information about the chemical structures of more than 40,000 metabolites!

The more pressing problem, however, is that analyzing relationships between hundreds of metabolites requires complex statistical analysis. What makes this task even more challenging than it seems is that there are so many independent variables. When we learn about experimental design, we are told to only change one independent variable, but this is impossible when collecting metabolome data.

  1. How can researchers conclude the underlying pathology for a disease when the metabolomes that they are analyzing belong to people with different medicine prescriptions, height, weight, water intake, calorie consumption, etc?
  2. Not only do researchers have to consider these factors, but they also have to consider natural biological fluctuations, including changes in metabolite concentration due to fluctuations in pH, enzyme concentration, and temperature.

Statistical Techniques and Network Generation

Before statistical analysis is even considered, researchers must eliminate noise from the data. Noise in this context commonly refers to solvent effects. In addition, the data must be properly scaled and normalized. One this is done, they are finally ready to perform the analysis.

Common statistical techniques used to process the data include

  • t-tests
  • f-tests

I won’t go through all these techniques in detail here, but the main idea is that these techniques are used to compare the means and variances between groups of metabolites (usually from different samples) to see if there are any associations by trying to reject the null hypothesis (that the means/variances of the two groups are equal). After doing this type of analysis, you’ll usually end up with something like the graph shown below.

This bar graph shows how the concentration of two specific metabolites varies between two different samples.

After going through this step of the analysis, the researchers finally attempt to create networks.

Bayesian Networks

Although the bar graph above provides useful initial information, we still haven’t answered the main question: How do all the metabolites that were detected correlate with each other? This is why network creation is helpful. It shows how each metabolite relates to other metabolites and under what conditions they correlate with each other. Each metabolite is depicted as a node and these “conditional dependencies” are shown through the lines that connect the nodes. Eventually, you’ll end up with something like what’s displayed below.

The figure above is from a research paper published by Myet et al: “Untangling the role of one-carbon metabolism in colorectal cancer risk: A comprehensive Bayesian network analysis.” The researchers successfully utilize Bayesian Networks to depict the relationship between environmental factors, one-carbon metabolite concentrations, and colorectal cancer (CRC).

Gaussian Graphical Models

Gaussian Graphical Models (GGNs) are another type of network model that researchers use to try and model metabolite relations.

One way to think of the metabolites in our body is like this: Imagine that there are 1000 people in a room, all talking at once. How could you figure out who’s talking to whom? What if one person is talking to multiple other people? What if one person’s conversation with another person leads that person to call another person? This is a great way to think about metabolic systems/pathways! What researchers are trying and doing is distill all the noise in the room to the conversations and the people in them. In addition, the researchers try and find how one conversation lead to the other.

In GGNs, researchers model relationships using partial correlation coefficients. This means that in GGNs, rather than measuring the direct and indirect relationships between metabolites, they compare two metabolites against all other metabolites.

Continuing with our previous analogy, imagine that person 1 talks to person 2 and person 2 talks to person 3. The relationship between persons 1 and 2 is direct, while the relationship between persons 1 and 3 is indirect.

GGNs remove all the indirect relationships from the visual. Although indirect relationships are especially important in biochemical pathways (It’s important to know how one thing leads to the other!), showing the direct relationships can serve as a starting point for scientists and a way to simplify the network.

Below is a GGN representation from “Statistical Methods for the Analysis of High-Throughput Metabolomics Data” I HIGHLY recommend reading this research paper if you are more interested in the technical aspects of this field!

Applications of the field

Metabolomics extends to all the other -omics fields! In fact, networks similar to the one shown above, like the one created above are used to show the relationships between a gene, the RNA transcript, the protein that’s produced, and any associated metabolites. This is called integration, and integration help researchers derive more biologically relevant conclusions from these analyses.

As we begin to understand the hiding metabolites in each of our cells, we can begin to personalize medicine for every person based on their specific metabolite concentrations! In fact, some of the metabolites might guide us towards new drugs that cure devastating diseases!

Just imagine the following scenario: You go to your doctor with a cold, and based on a previous blood test, the doctor knows the metabolite composition of your body fluids. In this case, the doctor would give you one dosage of a drug that is specially made for you and has no side effects. This is all possible with Metabolomics!

Key takeaways

  1. Metabolomics is an -omics field that is developing at a rapid rate.
  2. Each person’s metabolome is slightly different due to countless variations in metabolite concentration (They essentially provide a biochemical “fingerprint” for every person).
  3. Metabolomic analysis includes initial data processing (scaling, normalization, removing outliers).
  4. Statistical techniques compare different groups/samples of metabolites to derive meaningful conclusions.
  5. Networks are created to visually express relationships between all of the metabolites in a given study.
  6. Metabolomic data can be integrated w/ other -omic data to provide a larger perspective.

Metabolomics extends past the realms of pure scientific research.

  • We’re currently analyzing cancer metabolism and the Warburg effect. This has the potential to save millions of cancer patients’ lives!
  • Metabolomics will likely contribute to the synthetic biology revolution as well! We could take genes that code for metabolites with interesting functions, clone them, and insert them into bacteria!

In the past few centuries, medical research’s primary aim has been to eliminate and cure thousands of human diseases. Although each disease has its own underlying pathology, metabolomics techniques can help to help to cure them all and save billions of lives!

If you liked this article, clap it and connect with me on LinkedIn! Stay tuned for more articles related to the intersection between AI and Biochem!

Please sign up for my monthly newsletter here if you’re interested in following my progress :)



Mukundh Murthy

Innovator passionate about the intersection between structural biology, machine learning, and chemiinformatics. Currently @ 99andbeyond.