St Petersburg University scholars create a method for automatic detection of irony in speech
Linguists from St Petersburg University, in a study of speech, have identified gesture, intonation, and mimic patterns typical of a person who uses ironic expressions in his or her speech. Determining the key differences between ironic and neutral conversation will help to improve automatic speech recognition and audio-visual synthesis systems based on artificial intelligence.
Voice and audio-visual assistants have become a part of modern man's life. Still, a significant part of the techniques a person uses in speech in everyday life is clear only in a conversation between people. For example, ironic statements depend directly on the situation and intonation, so they are usually understood only intuitively and are not recognised by artificial intelligence.
The research findings are published in the proceedings of the International Conference on Speech and Computer.
When watching a film or reading a book, you can tell whether a character's words are ironic by the context. However, in modern dialogues, context is often missing. When we say ironically ‘You wish!’, ‘Sure!’ or ‘Aha!’ in a telephone conversation, the irony is only conveyed through intonation.
Unlike regular humour and jokes, irony is not just adding a playful connotation to the meaning, but an opportunity to question your own statement or even reverse its meaning. If the voice assistant fails to recognise this type of irony and takes an ironic ‘Sure!’ as an agreement, the communication will fail or lead to undesirable consequences.
According to linguists, irony in the speaker's words can be recognised with the help of video, as the meaning can become clear thanks to facial expressions and gestures. The researchers from St Petersburg University analysed fragments of conversations from Russian films and TV series because in the actor's speech emotions are expressed most vividly. Then, they recorded the speech of 60 speakers, native Russian speakers, in an acoustic booth at the Department of Phonetics and Methods of Teaching Foreign Languages of St Petersburg University. It is the first time such a selection of ironic speech samples in the Russian language has been made.
Earlier, experts from St Petersburg University have used digital algorithms to create a musical excerpt from the first day of the Great Patriotic War based on eyewitness diaries.
The recording was made simultaneously on a professional microphone and a video camera with a high frame rate. The researchers analysed the gestures and facial expressions of the speakers and processed the audio signals using special software. Then, experiments on perception were conducted: in one experiment, the participants listened to a short audio fragment and chose the corresponding passage of text with or without irony; in another, the viewers performed the same task, watching frames without sound; in the third experiment, the participants listened and watched at the same time. According to Uliana Kochetkova, Associate Professor in the Department of Phonetics and Methods of Teaching Foreign Languages at St Petersburg University, this made it possible to identify the intonation and gestures and facial expressions in which native speakers are able to hear and see irony, even without knowing the context. After that, the characteristics of the utterances with irony were ‘transplanted’ to the utterances without irony (and vice versa) and also given to native speakers to listen to. This helped to evaluate the role of each characteristic in expressing ironic meaning.
‘We conducted experiments on audio perception and also performed various modifications of the audio signal in order to pinpoint the intonations and gestures (facial expressions) patterns that are perceived by listeners and viewers as ironic without relying on any context,’ Uliana Kochetkova said.
According to Uliana Kochetkova, in ironic utterances, contrast is important as compared to the usual ‘neutral’ mode: the speed of speech, its volume, register, and voice range change. The nature of these changes also depends on the type of sentence. For example, ironic questions will differ from exclamations by a more noticeable increase in volume and a more noticeable stretching of words and individual sounds.
In gestures and facial expressions, irony is characterised not only by specific movements, for example of the lips and eyebrows, but also by the way they are combined as well as their relationship to the audio signal. As part of the fundamental research, the scholars have gained knowledge on what parameters a person uses to colour their speech ironically, and by which they determine it in the speech of the interlocutor.
The work on identifying speech patterns, such as irony, relying on sound and gesticulation allows to improve automatic speech recognition and audio-visual synthesis systems based on artificial intelligence.