The service «Orthoepic Dictionary Generator» performs the task of converting the electronic spelling note of Belarusian words into phonetic transcription in accordance with the norms of modern Belarusian speech. The service receives the text in Belarusian, then presents the source data in transcribed form: at the output of the processing is Cyrillic transcription of the word that reflects its correct pronunciation.
Let’s us consider how the service “Orthoepic Dictionary Generator” works.
In order to obtain transcription one needs to put a word and a certain part of a spelling dictionary, taken as the basis for transcription creation, into a special field, and click “Get text with transcriptions!/Атрымаць тэкст з транскрыпцыямі!”. However, as you know, in a spelling dictionary in addition to a word in its initial form there are presented its certain grammatic forms, as well as grammatical and stylistic marks. That is why one can obtain the transcription for elements he/she does not need to. In order to avoid this problem, we suggest to mark stresses for the words that you need to be transcripted.
There is an auxiliary button “Clear!/Ачысціць!” on the service webpage to quickly delete the inputted text.
To be able to constantly improve the service’s operation, the developers provided the feedback allowing the experts to send a letter with the report about the problem in case of finding a mistake.
To show the real results of the service’s processing, let’s study a fragment of the electronic spelling dictionary that is taken as the basis for the orthoepic dictionary’s creation:
As you can see in the figure above, besides the initial forms of words, there are also given some grammatical marks, word meanings in brackets (as such characteristics as animateness are important when choosing word endings for diffrent cases). Moreover, in brackets, after a word form, there also may be indicated an alternate ending for it. As it was already noted, all this additional data should not be transcribed, and the service does not take it into account. For example, a fragment of the electronic spelling dictionary after processing with stresses being marked looks as follows:
At the output the service generates Cyrillic transcription for each word and its forms to reflect their correct pronounciation. Thus, the Orthoepic Dictionary Generator is designed to represent the source data in the transcribed view. That greatly facilitates the work of linguists on the Belarusian orthoepic dictionary creation and allows to use this service for solving other linguistic tasks. For example, it is possible to create a similar service for the Ukrainian language.
Access to the service via the API
To access the service «Orthoepic Dictionary Generator» via the API, you should send an AJAX-request (type: POST) to the address https://corpus.by/OrthoepicDictionaryGenerator/api.php. With an input array data the following parameters are passed:
- text — input text, which can be either arbitrary text or a list of words (one word per line) or a fragment of a dictionary.
- stopWords — list of words that should not be transcribed by the service. Words should be separated by spaces or newlines. In the resulting text, after them transcription will not be displayed, but they themselves will be given in italics.
- mode — processing format. There are three processing formats:
- headwordsProcessing — resulting text will be only the first word of each line and its transcription;
- allWordsProcessing — transcription will appear after each word except stop-words;
- noojFormatProcessing — only the first word of each line will be processed; the result will be given in NooJ-format.
Example of AJAX-request:
$.ajax({
type: “POST”,
url: “https://corpus.by/OrthoepicDictionaryGenerator/api.php”,
data:{
“text”: “саке́ н., нескл.”,
“stopWords”: “н. нескл.”,
“mode”: “allWordsProcessing”
},
success: function(msg){ },
error: function() { }
});
The server returns a JSON-array with the following parameters:
- text — input text.
- result — resulting text.
For example, the following reply will be formed on the above listed AJAX-request:
[
{
“text”: “саке́ н., нескл.”,
“result”: “<b>саке́ </b> [сак’э́] <i>н.</i>, <i>нескл.</i>”
}
]
Спасылкі на крыніцы
The webpage of the service – https://corpus.by/OrthoepicDictionaryGenerator/?lang=en
Cross references
- Гецэвіч, Ю.С. Праектаванне інтэрнэт-сервісаў для працэсараў сінтэзатара маўлення па тэксце з магчымасцю прадстаўлення бясплатных электронных паслуг насельніцтву / Ю.С. Гецэвіч, С.І. Лысы // Развитие информатизации и государственной системы научно-технической информации (РИНТИ-2014) : доклады XIII Международной конференции (Минск, 20 ноября 2014 г.). – Минск : ОИПИ НАН Беларуси, 2014. — C. 265-269.
- Русак, В.П. Першы даведнік па культуры беларускага вымаўлення / В.П. Русак, В.А. Мандзік, Ю.С. Гецэвіч, С.І. Лысы // Весці Нацыянальнай акадэміі навук Беларусі. Серыя гуманітарных навук. – 2019. – Т. 64, № 1. – С. 69-80.
- Русак, В.П. Сучасная беларуская лексікаграфія: новы фармат / В.П. Русак, Ю.С. Гецэвіч // Слово и словарь = Vocabulum et vocabularium : сборник научных статей / редкол.: И.Л. Копылов (гл. ред.). – Минск : Беларуская навука, 2019. – C. 120-124.
- Русак, В.П. Праблемы нормы, культура мовы і генератар маўлення / В.П. Русак, Ю.С. Гецэвіч, С.І. Лысы, В.А. Мандзік // Зборнiк дакладаў i тэзiсаў VIII Міжнароднай навукова-практычнай канферэнцыі «Традыцыі і сучасны стан культуры і мастацтваў» (Мiнск, Беларусь, 7–8 верасня 2017 года) / Цэнтр даследаванняў беларускай культуры, мовы і літаратуры НАН Беларусі ; гал. рэд. А. І. Лакотка. — Мінск : Права і эканоміка, 2018. — C. 748-752.
- Марчык, М.У. Вычытка і генерацыя тэкстаў вялікага памеру на беларускай мове / М.У. Марчык, Г.Р. Станіславенка, С.І. Лысы, Ю.С. Гецэвіч // Развитие информатизации и государственной системы научно-технической информации (РИНТИ-2017) : доклады XVI Международной конференции, Минск, 16 ноября 2017 г. / ОИПИ НАН Беларуси ; под науч. ред. А.В. Тузиков, Р.Б. Григянец, В.Н. Венгеров. — Минск : ОИПИ НАН Беларуси, 2017. — C. 305-310.
- Лысы С.І. Генерацыя нацыянальнай транскрыпцыі тэкстаў на беларускай мове / С.І. Лысы, Ю.С. Гецэвіч // Інфарматыка. — 2017. — №54. — C. 84-92.