How are Protein Structures Refined?

Let’s decipher the black box — A Simple Explanation to Force Field functions like CHARMM, programs like Rosetta, and cutting-edge Deep Learning Algorithms

The physics-based approach

The physics-based approach uses what is called “force fields.” Here’s essentially the equation for a force field:

Bonded Parameters

U(bonded) = U(bond) + U(angle) + U(UB) + U(dihedral) + U(improper) + U(CMAP)

The Non-bonded Parameters

The non-bonded terms consist of partial dipole-dipole attractions, dipole-dipole repulsions, and full electrostatic attractions based on Columb's law.

The Statistics based approach

Rather than using physics and the exact distance, position, and angle between each pair of amino acid residues, Rosetta, a protein folding software developed by the Baker Lab and other collaborators at the Institute for Protein Design instead uses statistics to model proteins. In other words, given a certain set of residues with a certain set of parameters, what’s the probability of seeing a set of amino acids occupy a certain conformation?

  • fa_atr, fa_rep → These two terms represent the attraction and repulsion energy terms

Rosetta AbInitio Demo

Rosetta is a biophysical modeling software that uses simulations and scoring functions to optimize structures. In this project, I used what are known as ab initio protein folding methods to uncover the three dimensional structure of the nsp9 RNA binding domain of SARS-COV2 given solely sequence information.

In the image to the left, you can see that the RMSD (uncertainty in the structure) goes down as the energy scoring function is minimized and becomes more negative.
Protein alignment generated in Pymol with ~6.0Å RMSD
ResNeXt also uses skip connections like Resnet; however, it also adds multiple paths that converge at the concatenation, each with a different set of weights.

Results

Why is this Important?

When COVID-19 first came to light, we didn’t have any protein structures for the key viral players in it’s life cycle, including proteases (that chop up RNA for processing) along with helicases and polymerases that allow for the transcription of the RNA. Getting these structures took weeks — and we couldn’t sit still while the virus claimed thousands of lives.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mukundh Murthy

Innovator passionate about the intersection between structural biology, machine learning, and chemiinformatics. Currently @ 99andbeyond.