St Petersburg University researchers teach a neural network to recognise the speech of Holocaust victims
Linguists at St Petersburg University have modified the Wav2Vec 2.0 and taught the neural network to recognise the speech of people talking about the strong emotional shock they had experienced. The neural network was trained on the interviews with Holocaust victims recorded by the World Holocaust Remembrance Centre Yad Vashem.

Speech emotion recognition is a crucial task in human communication and computer automatic systems as effective speech recognition systems can generate subtitles, generate a retelling of the main ideas of the video, and easily transfer it into text format. Cutting-edge technologies are used for speech recognition, yet when it comes to speech emotion recognition, the task is much more complicated.
The research findings are published in the proceedings of the International Conference on Speech and Computer.
Understanding what is being said in audio recordings of conversations can be a hard nut to crack, especially when interlocutors vividly express their emotions, i.e. crying or screaming loudly. Improving the quality of speech recognition systems could significantly simplify and speed up the process of generating interlinear subtitles for interviews with people who have experienced severe emotional shock. This is the point when you are working with those who have witnessed global historical events.
The researchers from St Petersburg University have developed a neural network to use for speech recognition tasks and, what is more important, for recognition of highly emotional speech that people experience during an interview.
‘We used the deep neural network Wav2Vec 2.0, pretrained in Russian. The idea is to teach the network to map each sound in a person’s spoken language to the corresponding letter of the alphabet. This neural network architecture also uses the so-called attention mechanism to learn to pay attention to features that are significant for identifying a particular letter by a sound, which significantly improves the quality of the results,’ said Mikhail Dolgushin, a master’s student in the Department of Computational Linguistics at St Petersburg University.
To this end, St Petersburg University researchers used the Russian-language speech recognition model developed by Ivan Bondarenko, Professor at Novosibirsk State University. The model is publicly available. The performance of the neural network developed at St Petersburg University was tested using the interviews with Holocaust victims made publicly available by the Yad Vashem, i.e. Israel’s official memorial to the victims of the Holocaust. Video testimonies from Holocaust survivors have been collected by the memorial for more than 50 years. In the videos, people talk about the events they witnessed, i.e. the occupation of cities, massacres, and life in the ghetto to name just a few.
27 January is International Holocaust Remembrance Day. On this day in 1945, the Auschwitz, Nazi Germany’s largest concentration camp, was liberated by Soviet troops, and on the same day in 1944, the Siege of Leningrad was completely lifted. The 80th anniversary of the Siege of Leningrad is celebrated in 2024.
The University specialists processed more than 26 hours of conversations. The linguists made sociolinguistic markings, determined the gender, age, approximate region of origin and native language of the interviewees. As the experts explained, these features significantly influence what accent the speakers have, what vocabulary they use, and therefore how effectively their speech will be recognised by automatic models.
According to the researchers, this technology can also be applied to recordings of other people. Yet the quality of recognition may be slightly worse due to different recording conditions or if the speech was poorly represented in the sample, such as children’s speech.
Today, St Petersburg University specifically focuses on the development of educational areas and scientific research related to artificial intelligence technologies. The University has opened the Centre for Artificial Intelligence and Data Science, aimed at developing and implementing large-scale self-organising adaptive and distributed digital platforms for artificial intelligence of things (AIoT) and industrial applications of this technology in the digital industry.
Also, the Research and Education Centre "Mathematical Robotics and Artificial Intelligence" has been set up at St Petersburg University to bring together and combine the efforts of the University researchers in studying intelligent control, mathematical robotics, and educational robotics. One of the Centre’s current projects is developing a new method to help find people lost in the woods using drones. At the University, there is emphasis placed on research into how artificial intelligence impacts our everyday life. Recently, a sociologist at St Petersburg University proposed methods to combat the threats of artificial intelligence.