St Petersburg University phoneticist talks about voice assistants from a language perspective on a podcast from St Petersburg University

Uliana Kochetkova, Associate Professor in the Department of Phonetics and Methods of Teaching Foreign Languages at St Petersburg University, was the guest of the ninth episode of the popular science podcast "Heinrich Terahertz". She spoke about how voice assistants perceive our speech, why they do not recognise questions and irony, and whether they need to be taught to do so.
Voice assistants are an important part of many businesses today. They can book a doctor’s appointment for you; process orders; answer general knowledge queries; deliver weather updates; play music; and much more. Unlike humans, however, artificial intelligence (AI) has no ability to fully comprehend speech, as a human does. Uliana Kochetkova explained that all AI-based technologies are mainly focused on a text. They transform any voice message into a set of symbols and analyse them.
"Stress in speech is of particular importance when we have to interpret spoken language. The same phrase can sound condescending or ironic. If you write down this phrase, the implied message may not be clear. It is linguistic cues, specific to each language, that help you understand it correctly," she said.
In spoken language, meaning can be provided through: intonational patterns; gestures; facial expressions; and the word order in a sentence. The latter gives a specific linguistic cue helping algorithms understand that a person is asking a question. Linguistic norms, however, does not always require that an interrogative sentence should have a certain word order. Question words in such a sentence are also not obligatory.
Earlier, the St Petersburg University linguists conducted a research on speech and identified gesture, intonation, and mimic patterns typical of a person who uses ironic expressions in his or her speech. Determining the key differences between ironic and neutral conversation will help to improve automatic speech recognition and audio-visual synthesis systems based on artificial intelligence.
For this reason, AI-based voice assistants cannot accurately comprehend a certain emotional message. The researchers are developing teaching algorithms to recognise certain language techniques. In the professional community, however, there is no consensus as to whether AI should be taught to understand humans accurately.
"Artificial intelligence is a great help in our lives, making many tasks easier. The question is whether we really want a voice assistant to know that we are, for instance, scared or stressed. This opens up an array of possible scenarios, not all of which are safe," said Uliana Kochetkova.