One recent scientific breakthrough is the development of algorithms that can predict the structure of a protein. In the long term, this will enable the creation of new personalised medicines and new vaccines on computer, in weeks or even days. Mathematician and biologist Oleg Mikhailovskii spoke to TASS about the study of proteins, and explained how understanding of their structure can help doctors and why a scientist needs to be a ’multihyphenate’.
Science fiction writers of the past dreamed of an artificial super-mind that would solve all the problems of mankind. But nowadays we often see such headlines as ’neural network fakes the voices of celebrities’, ’neural network learns to replace actors’ faces in films’, ’neural network records a rap album’...
There are, however, some truly significant advances. One of them has to do with predicting the behaviour of living systems. At the end of 2020, programmers at DeepMind, a research arm of Google, created a programme that can build the structure of a protein from its amino acid sequence. With this data, scientists can learn a lot about proteins, including the way they behave.
Oleg Mikhailovskii is a member of the Laboratory of Biomolecular NMR at St Petersburg University and a Russian scientist who studies the structure of proteins. He works on an add-on to the Amber software package, one of the world’s two largest tools for biomolecular modelling. His development allows high-precision protein structures to be obtained based on X-ray diffraction data and computer simulation of molecular dynamics. We asked Mr Mikhailovskii to explain in simple words the research which is so little understood by most people, but which could change our lives in the very near future.
On why protein structure is the ’mystery of the century’
Proteins perform a lot of an organism’s functions. It is easier to say what they do not do. To give one example, the enzymes that help us digest food are proteins. Fibrillar proteins form our tissues, such as muscles or hair. Proteins are involved in the formation of the immune system, and ensure the self-regulation of the body, for example, blood coagulation. Imagine a long string of laces of different colours linked together and woven into a very intricate shape. The strings are amino acids, the essential building blocks of protein. There are only 20 basic amino acids, but they are arranged in different sequences. Once you know their sequence, you can try to theoretically predict the structure of a protein and how it behaves in the body. This opens up enormous possibilities.
Another example is Alzheimer’s disease. A typical sign is the accumulation of misfolded beta-amyloid proteins in the brain, which cause neuronal failure. Basically, the misfolded proteins begin to layer on top of each other and block the interaction between cells. Such breakdowns in proteins could be identified. Then, based on the model, it will be possible to introduce the mutation exactly where it will be useful, for example, to increase or decrease the activity of a certain protein.
On the breakthrough brought about by neural networks
Technically, we can manually identify the structure of a protein, for example, using X-ray crystallography. How does this work? We grow a crystal with a protein, shine a beam on it and get a picture from the detector. Then we check where in the picture the intensity points are, and using these points, we can build a model of the protein. In other words, we can figure out where each atom is located.
Developing this model from scratch, however, is very difficult and costly. Depending on the protein, growing a good quality crystal can take several months. It is almost impossible to foresee how long it will take to find the ’right’ conditions for its growth. Also, there are some proteins that do not crystallise. There are other methods of spectroscopy, such as NMR, but they may not work for various reasons. This is where computer modelling comes in handy. If we do not know the structure of a new protein, we can try to guess it from the structure of its ’relatives’. Over the years, experiments have allowed us to accumulate data on about 170,000 structures. By training a machine algorithm on this basis, it is possible to get relatively accurate predictions. Take Google’s AlphaFold2 neural network, for example. However, their model is inaccurate by about the size of an atom, and that could be important for predicting the interactions of the protein with other molecules.
Nevertheless, we still need experiments. The AlphaFold model is trained on a particular set, but its capabilities beyond that set are hard to say. For known proteins, we can confirm or refute the results obtained with the neural network, while for new ones, there is no verification. That is why we need to get experimental data to confirm the predictions. It is the processing of this data with the help of the software ‘add-in’ that allows us to build the most accurate models of proteins, which no neural network has so far been able to do.
On conducting a virtual reality experiment
Strange as it sounds, a computer experiment is still an experiment. You run a simulation and look at the results. The simulation is based on a basic protocol. You set up the conditions, and you see what happens. There might be, for example, a hundred of these conditions. Naturally, no one can set them up manually. Instead, I write a script. I think any modern scientist should know how to work with scripts. It is no longer possible to do without them and it’s no fun to input 100 conditions by hand.
How does this happen in practice? Suppose we need to predict how one molecule joins with another. There is a mathematical function that describes how atoms behave in relation to each other. We substitute conditions and calculate them for a thousand different molecules against one target molecule. We run a thousand simulations. Then, based on the results, we can understand, for example, how a drug molecule will work.
The most interesting thing is when you have got the results and try to discern some pattern based on the criteria for evaluation. There are times when you find that there is no pattern, and then you try to find an explanation why this is so. When you’ve tried all sorts of things and realised that nothing works, it’s frustrating. I try to forget about days like that, though. In the morning I wake up refreshed and try again.
On transition from a mathematician to a biologist
Although most of my work now takes place on a computer, I can also connect an electrode to a mouse brain and conduct a spectroscopic experiment. My starting point, however, was mathematics. I went to a regular school in St Petersburg and, at the same time, attended the Computer Technologies Centre at Anichkov Palace, learning programming.
One day, when I came to sign up for another year, my teacher advised me to try and enrol in the Laboratory for Continuing Mathematical Education. I went there and spoke to the director, passed some difficult tests and began studying. It was very unusual that classes were called ‘pairs’ (which can be translated as a double period in English) already from the 9th grade. We were even assigned research supervisors. Being a student of that school, I started going to Russian and international scientific conferences for schoolchildren, and taking part in competitions. I went to the famous Intel ISEF in the US, and won an award from the American Mathematical Society.
I went on to study mathematics at the Mathematics and Mechanics Faculty of St Petersburg University, specialising in differential equations and dynamical systems. I even entered a post-graduate course, but it was still not satisfying. Then I started applying to other universities for other courses. I took courses on Coursera and other platforms not related to mathematics. I got interested in biochemistry and neural networks. The stars aligned and my application was approved. I was offered a place on an interdisciplinary programme at Purdue University in the USA. In the first year of the programme, every student goes through four rotations with different supervisors to find out which one is best for him or her. I had time to do programming in bioinformatics and worked in the neuroscience lab. I even cut up mice. No cruelty though, it was all very civilised.
I know how to accurately connect an electrode to the brain of a mouse, which is not easy. My supervisor told me I was good at it. But I was turned off by... the smell. The vivarium, where the mice live, really smelled awful. After two months of working there, I felt like I could smell the vivarium all the time, even after changing my clothes.
After the rotations, my choice was the laboratory of the Department of Chemistry, which focused on NMR spectroscopy of proteins. I changed my dissertation topic to X-ray spectroscopy. The defence took place during the lockdown, but I did it on campus. The large hall for 100 people hosted 4-5 committee members seated at different ends of the room. Then I returned to Russia. It so happened that the head of the dissertation committee at Purdue, Nikolai Skrynnikov, also worked here at the Laboratory of Biomolecular NMR at the Institute of Translational Biomedicine, St Petersburg University. I now work at the laboratory and we continue the project I started while I was writing my dissertation.
On competition and cooperation
Today, science is a team endeavour. Of course, there is competition, for example, for grants. When you are interested in a particular subject, you know what people around the world are involved in it and you look for gaps in the work of others which might be useful for you. Sure, you also want to publish the results before anyone else.
It might happen that six months before your publication, someone will produce an article on the same topic. In spite of feeling frustrated, you search for a solution. You look at what they’ve done, and think: aha, I’m going to do it in a different way. It may look similar, but it’s the other way round. But science is not a jungle where it’s eat or be eaten. There is competition, but you can write to someone and get advice. You can even make an agreement: ’I’ll wait until you publish, and then I’ll produce my results’. Collaborations between scientists from different countries and laboratories are common. We collaborate with universities in Europe, America, and Asia, for example, with Rutgers University in the USA and Tsinghua University in China.
You know, I recall a lecture by James Watson, the discoverer of DNA, held at St Petersburg University in 2017, or so. He was asked what the most important thing is for a scientist. Bear in mind, his field of study is pretty much the same as mine. And he mentioned three things... I can’t vouch for the accuracy, but I’ll try to convey the meaning. First, stay curious. Second: be kind. I don’t remember the third one, but the first two are absolutely true, I believe. Stay curious about the results and be kind to your colleagues. Without this, it is difficult to build communication and live in a community.
On plans and contribution to science
The problem with X-ray models is that they are static, whereas a protein is a dynamic system. In some spots, the model of protein structure looks ’washed out’, and it is not clear what is happening there. My task is to make the software ‘figure out’ how the protein behaves in the areas missing from the X-ray image.
It will be great if this tool finds a wider audience. However, this is a fundamental, internally scientific task, so to say. As for more global plans... I would be interested in doing something more applied, for example, developing products that in the short term could help fight environmental issues. For example, designing enzymes, a specific class of protein, to process plastics and thus contribute to making the world a better place. And they are already providing tangible benefits to the world around us. After all, I hope those frequent changes in my fields of work were not for nothing. I am interested in what can be useful for nature and our environment. To speak the truth, entering a new field is a scary thing to do. But somehow you push yourself over the edge and step into the abyss. What else could it be?
On the superiority of machines over humans, and vice versa
A neural network does not really provide for an understanding of how it arrives at a conclusion. It produces a result, but does not explain the reason why. If a neural network has drawn a picture or written music, I may like it or not. I will wonder if the drummer has fallen out of tune. I will wonder if it was done deliberately or accidentally. When it is done by a neural network, it is absolutely intentional. It’s an algorithm. We seem to understand what it’s doing, but we don’t understand what’s in it.
A machine is a black box, but so is a human being in some ways. Sometimes it takes me a long time to make a decision. Or I find it hard to understand the way a particular formula should be applied, as it used to be back in my maths classes. Then suddenly, at some point, I wake up and realise: that’s what it’s for! Who can really explain why this happens at a particular moment? What came together in the brain and what influenced it? Thus, a man is indeed a black box, and a very interesting one, too.