The fourth International Conference 'Bioinformatics: from Algorithms to Applications' (BiATA 2020) is held annually by St Petersburg University ('The Center for Algorithmic Biotechnology' at St Petersburg University). It has been successfully completed.

Held online, this year BiATA has attracted four times more participants than usual. The organisers also noted a significant increase in the number of researchers from Russia. The Chairman of the Conference Programme Committee, Associate Director of the Center for Algorithmic Biotechnology of the Institute of Translational Biomedicine at St Petersburg University Alla Lapidus believes that this is a sign that bioinformatics in our country is reaching a new level.

Russia is home to unique natural sites that have not yet been explored. However, BiATA 2020 demonstrated the research that our scientists conducted on microbiots of Lake Baikal, the White Sea, the taiga forests of Siberia and more. I am sure that the number of these projects will only increase in the future.

Alla Lapidus, Associate Director of the Center for Algorithmic Biotechnology of the Institute of Translational Biomedicine at St Petersburg University

Over the years, the conference has evolved into a full-scale platform to bring together major developers of software products for the analysis of modern biological data and researchers supplying primary data from around the world.

Although the topics of the reports are extensive, ranging from the decoding of the human genome to a wide range of metagenomic research, there is one thing that unites them all: huge amounts of data, the interpretation of which requires reliable and easy to use software.

Alla Lapidus, Associate Director of the Center for Algorithmic Biotechnology of the Institute of Translational Biomedicine at St Petersburg University

On the first day of the conference, the Director of the Australian Centre for Ecogenomics at the University of Queensland (Australia), Professor Phil Hugenholtz, a leading expert in taxonomy, made a presentation on the problems of systematics of simple single-celled prokaryotes in the era of big data. According to the scientist, the modern classification of microorganisms does not take into account almost 85% of all microbiological diversity such as uncultivated microorganisms, which cannot be cultivated in the laboratory. 'In developing a new genome-based systematics, we used the existing classification of microbes obtained in laboratories. In the first stage, we divided up polyphilletic groups – a set of species that have several common ancestors. Then it was necessary to take into account the evolutionary divergence – the divergence of features of related groups during evolution,' explained Professor Hugenholtz.

As a result, scientists managed to obtain a fully systematised classification of bacteria that takes into account all evolutionary processes. It turned out that out of hundreds of thousands of bacteria genomes available for study, more than half required a change in the classification of bacteria. The changes affected both the higher ranks – departments, and the lower ranks – genera. For example, the genus of anaerobic Clostridium bacteria was divided into a hundred new isolated genera.

Artem Babaian from the University of British Columbia (Vancouver, Canada) spoke about a large-scale project – Serratus. It is aimed at finding new coronaviruses in public databases, which is especially relevant in the current epidemiological situation. 'The main problem is that we still do not have a complete understanding of biodiversity and the nature of viruses. Besides, there are very few fully deciphered genomes in the public domain. Therefore, in order to identify new coronavirus sequences, we analyse all available data on both already known and completely unknown viruses. This is more than 3.4 million samples of biological material from around the world that still need to be deciphered,' said the researcher.

Collection of RNA-virus genomes, and first of all coronaviruses, is possible through a new development of the Center for Algorithmic Biotechnology at St Petersburg University. CoronaSPAdes, special assembler to collect spades (St Petersburg Assembler), is a flagship product of the Center. Artem Babaian noted that the new tool helped Serratus to optimise the assembling process. Unlike other software products, the assembler takes into account all the features of the RNA-virus genome structure. The coronaSPAdes is a result of many years of research at the Center for Algorithmic Biotechnology at St Petersburg University. The developers Dmitry Meleshko and Anton Korobeynikov say that without this research the creation of the assembler would have been impossible.

The conference participants had an opportunity to learn to work with one of the innovative software tools MGnify as part of the workshop ‘Analysis of metagenomic assemblies using MGnify’. Rob Finn, Lorna Richardson and Alexandre Almeida from the European Bioinformatics Institute (EMBL-EBI) explained the specifics of working with the MGnify to novices in bioinformatics. By the end of the conference the participants had learned not only to collect complex metagenomic data, but also to annotate the sequences obtained after the assembly, describing them functionally and taxonomically.