United Institute of Informatics Problems of the National Academy of Sciences of Belarus for more than 40 years has been engaged in speech technology. A new direction proposed by the former head of the laboratory, and now – the main researcher, Boris Lobanov Ph.D. – is computer person voice cloning.
This technology allows you to play any text with the manners of reading of a particular person and his voice, to recreate the voices of well-known personalities.
Lilija CIRUĹNIK, the acting head of the speech synthesis and recognition laboratory of the United Institute of Informatics Problems of the National Academy of Sciences tells us about prospects for the development of speech technologies.
— What is the speech recognition?
— The ultimate goal of speech recognition to make computer program understand the meaning of statements and perform some action. There are two tasks. The first one — separate voice recognition commands.For example, instead of entering these or other commands using the keyboard or mouse, you can give them by voice. The system will respond — select text, copy, move to the line above.
The system can be used in manufacture when working with complex equipment, where instead of using mechanical levers voice commands can be utilized.
The second problem — a so-called recognition of continuous speech. It’s like stenography. So the computer will be able to give you our conversation in a view of a text file.
— Speech synthesis — vice versa?
— Yes. Speech Synthesizer is a computer program, which according to the entered text by voice output information, creates audio files corresponding to the input text. Do you want to – the program will read to you Leo Tolstoy or a newspaper article. The main thing is an original text file.
— Then whose voice will it be?
— Any text of any size can be read by male or female voice. With an original technology, we can create a personal voice of any man. When playing, you can change the tone of voice, the speed and the playback volume. The resulting voice recording can be saved in different formats, for example, in the popular MP3 format.
— How can you use it in practice?
— With it, for example, we can create audiobooks. Of course, a professional actor will voice audiobook much better than a computer program. However, with the use of the program – a speech synthesizer, you can choose to listen to any book and to create on its basis the sound file.
The speech synthesizer is important for blind and visually impaired people and, in particular, information kiosks, which are now used in banks, airports, railway stations. Information kiosks give not only visual information (granted on the screen), but voice information as well. This information is now usually pre-recorded and played back when necessary. However, after any change it should be rewritten. If you use a speech synthesizer, it will simplify and reduce the cost of the task.
Another example is to inform customers on the phone. For instance, some organizations have to report the debt for the rent or telephone. It would also be wise to use a speech synthesizer.
By embedding a speech synthesizer in the work with e-mail, you can listen to the incoming mail while doing something else. You can, for example, convert the form of an electronic newspaper into the audio file and listen to it on the way to work.
— Are there many such programs in the world?
— Yes, of course, Programs exist for the majority of modern languages. There are several systems for the Russian language, the quality of which is comparable with the system we have created. The development of a speech synthesizer for each language has its own characteristics.
— You are working on the creation of speech synthesis for the Belarusian language, aren’t you?
— Yes. But the quality of the program does not satisfy us.
— What’s the problem?
— One of the main features is the development of linguistic and acoustic information resources while creating a speech synthesis system. In the synthesis speech text you need to know where to put the emphasis on each word. Belarusian and Russian does not have stable stresses, so you need to create an electronic dictionary of stresses, containing the largest possible number of words. Another problem is the intonation of speech. To make synthesized speech “right” a database should be created for the intonations of the Belarusian language. It is necessary to have a voice database for scoring arbitrary text, which contains all the sounds of the language and basic shades. Such a framework for the Russian language contains about 800 short audio segments. It is necessary to replenish the sounds specific to the Belarusian language
— Are your developments available for users?
– We offer the developed system of creating and scoring audiobooks “aBookForge” as software product that can be purchased by any user. The Institute has concluded a license agreement with a private firm, which sells it.
— Tell us about the project “Talking Head”.
– It is an audio-visual speech synthesis. Speech synthesis audiovisual technology includes not only the scoring of voice, text comments, but also it displays the head and articulatory organs (lips, cheeks, jaw, etc.) during text pronunciation. There are two approaches of formation an audio-visual speech synthesizer: the creation of stylized three-dimensional model of “talking heads”, as well as creating a personal two-dimensional “talking head” of a particular person on the basis of photographs of his face in the pronunciation of certain sounds. The system of audiovisual speech synthesis on the text is in demand not only for people with sight problems, but also for hard of hearing, as they can read the “talking head’s” lips
— In your opinion, what prospects has the development of speech technologies in Belarus?
– During the last 15-20 years the speech technologies got rapid development. Speech recognition, speech synthesis of text, voice identification and verification of identity has now achieved a high quality and are used in many practical applications. However, existing systems are used in many new practical fields, new ways of their improving are developing. The speech technology development has high potential in Belarus. TTS systems can be further developed and implemented on the scoring systems of public transport, teaching of Russian / Belarusian, self-service terminals.