The service «Word Paradigm Generator» allows to get the word paradigm. The service receives a word or its word forms at the input, then the service searches for a paradigm in the dictionary, and in case of the absence of a ready-made paradigm, at the output the user receives the generated paradigm of the entered word or paradigm created on the basis of words similar in writing. The resulting generated paradigms have an accent marked with the “+” symbol, and special tags separated from the word by the underscore “_”.
Basic terms and concepts
Grammatical meaning of the word – the word characteristics in terms of its affiliation to a particular grammatical category. Grammatical categories are, for example, the values of gender, number, case, declination, inclination, aspect and others. Grammatical values help to classify the vocabulary of language. For example, nouns “tree” and “sun” have a different lexical, but the same grammatical meaning – these are common, inanimate, neuter, singular nouns in nominative case [1].
Lexema or Paradygmatycal word – a word as an abstract (non-textual, dictionary) language unit [2, p. 20].
Lemma – the initial form of the word. For example, for nouns it is nominative singular form (lemma of the word clouds is cloud in English, lemma of the word аблокаў – воблака in Belarusian).
Paradigm – a set of all word forms of the word. For example, tree, trees in English or дрэва, дрэва, дрэву, дрэва, дрэвам, дрэве, дрэвы, дрэваў, дрэвам, дрэвы, дрэвамі, дрэвах in Belarusian.
Word – one of the basic structural units of a language that serves to name the concepts.
Word form – identifies different grammatical meaning of the word, while maintaining its lexical meaning.
NooJ – software to develop linguistic processes [3, 4].
Word forms of one lexema have derivational paradigm (wood, woods or лесу, лесам, лесе і г. д.), which is the result of the service.
Practical value
The service helps in creating linguoacoustic resources, in particular, dictionaries. For example, dictionaries of another platform service www.corpus.by «Spell Checker» were created using this service, namely, user dictionaries of the Belarusian language S2016_01 [5], S2016_03 [6]. Also, the results of the service are useful in individual use if it is necessary to get to know one or another form of the word.
Service Features
In work of the service, the SBM1987 dictionary created on the basis of the publication «Слоўнік беларускай мовы. Арфаграфія. Арфаэпія. Акцэнтуацыя. Словазмяненне / пад рэд. М.В. Бірылы. – Мінск, 1987» [4] is used, which is also involved in the work of the «Spell Checker» service.
Tags that are displayed after the “_” symbol help to indicate the grammatical meaning of the word, for example, part of speech, gender, number, case, etc., and generate paradigms of the input word based on similar words with the same grammatical meaning (if the input word cannot be found in the service dictionaries). These tags are necessary directly for the work of the service and are currently not freely available. In the future, they will be implemented as user interface elements, namely drop-down menus, which will provide the opportunity to sequentially set the desired grammatical meaning of the word. Currently, user can select only part of speech of the entered word in the drop-down menu.
At the moment, the service generates paradigms for words in Belarusian language. Mastering the word processing in other languages by service is possible by adding dictionaries and linguo-acoustic resources of these languages, respectively.
Interface Description
The graphical interface of the service is shown in figure 1.
The interface contains the following areas:
- Input field for word form (s);
- Choice of processing method (dictionary);
- Auxiliary selection of a tag and/or part of speech;
- Button «Generate possible paradigms!», which starts processing and gives the opportunity to get results.
User work with service scenarios
Scenario 1. Generation of a paradigm missing in the dictionary according to similar spelling words
- Enter the desired word and/or its word form in the input field (for example, аўдыягід).
- Below the input field, mark «Processing according to wordforms dictionary».
- To obtain not all the possible paradigms, but only paradigms the closest to the correct variant, in the field «All parts of speech» mark a part of speech (аўдыягід is a noun). Along with this, only paradigms composed on the basis of similar words of the selected part of speech will be proposed as refinements.
- Click «Generate probable paradigms!» and get the result. The service will find words similar in writing and offer the user paradigms created using their example. The absence of a paradigm in the dictionary will be marked by the expression #This paradigm is generated based on the following words (figure 2).
- Consider the proposed paradigms and select the appropriate or closest to the correct paradigm, and make corrections if necessary.
For example, there is no paradigm of the word аўдыягід in the service dictionary. When scenario 1 of work with the service is executed, the result will show paradigms based on all words similar in writing found in the dictionary – гід, альдэгід, поліфармальдэгід, фармальдэгід, агід, эгід (figure 2).
Obviously, the closest to the correct one is the paradigm based on the word гід, but in this paradigm inappropriate word forms of the accusative singular and plural are generated, since гід is an animated noun, and аўдыягід is inanimate. Paradigms generated based on other words contain even more discrepancies, since the service took inanimate words (альдэгід, поліфармальдэгід, фармальдэгід) and word forms of feminine words (агіда, эгіда) as an example.
Difficulties may arise when generating paradigms of rarely used, specific words or words that are new for language. So, for the word агмень service offers 17 variants of paradigms, generated on the basis of nouns with different grammatical meanings. In such cases, when choosing a paradigm, the user must rely on his own knowledge and subject literature (dictionaries, reference books, etc.).
Scenario 2. Search for a word paradigm in a dictionary of word forms
- Enter the desired word and/or its word form (for example, дрэва) in the input field.
- Below the input field, mark «Processing according to wordforms dictionary».
- To speed up the service, you can select a part of speech in the «All parts of speech» field (дрэва is a noun).
- Click «Generate probable paradigms!» and get the result. The paradigm found in the dictionary will be marked with the expression #The paradigm is found in the dictionary (figure 3).
If you enter an incorrect entry (punctuation marks, numbers, Latin characters, large text, etc.), the service will notify you with a corresponding mark. For example, when entering the phrase зялёнае дрэва,%7# the following output will be received:
зялёнае дрэва,%7# – incorrect request
Example of correct request: загадчык
Scenario 3. Generation of a word paradigm according to the NooJ inflectional dictionary
- Enter the desired word and/or its word form in the input field, separating by jumps to a new line. (e.g. клад,NOUN кладзе,NOUN кладамі,NOUN).
- Below the input field, note «Processing according to dictionary of inflections in NooJ format». For the correct work of this dictionary, part of the speech should not be marked.
- Click «Generate probable paradigms!» and get the result (figure 4).
After receiving the results of work according to this scenario, the user should also carefully consider the proposed paradigms and select the correct one.
Access to the service through the API
To access the «Word Paradigm Generator» service via the API, you need to send an AJAX request of the POST type to the address https://corpus.by/WordParadigmGenerator/api.php. Elements of the input array data have the following parameters:
- text — a word or several words, the full paradigm of which must be obtained. The format depends on the operating mode of the service. In the «general» mode, words are entered without additional marks through the line feed. In the «nooj» mode, words are entered in the format «word,part of speech», for example, «клад,NOUN».
- mode — service operation mode. Two modes are available: «general» and «nooj».
- tag — a tag indicating a number of grammatical features of a word. For example, «NNAMO».
- category — the grammatical category of the word. The following categories are available: “усе”, “назоўнік”, “прыметнік”, “лічэбнік”, “займеннік”, “дзеяслоў”, “прыслоўе”, “прыназоўнік”, “злучнік”, “часціца”, “выклічнік”.
Example of AJAX request:
$.ajax({
type: “POST”,
url: “https://corpus.by/WordParadigmGenerator/api.php”,
data:{
“text”: “абмакванне абмакваннямі абмакванню”,
“mode”: “general”,
“tag”: “”,
“category”: “назоўнік”
},
success: function(msg){ }
});
The server will return a JSON array with the following parameters:
- text — input words.
- result — summary list of paradigms.
For example, using the above AJAX request, the following response will be generated:
[
{
“text”: “абмакванне
абмакваннямі
абмакванню”,
“result”: “#This paradigm is generated based on the following words : абвалакванне, адскакванне, аплакванне, вывалакванне, завалакванне, звалакванне, падскакванне, падтакванне, праскакванне, развалакванне, саскакванне, узвалакванне, ускакванне
абма+кванне_NNINO
абма+квання_NNING
абма+кванню_NNIND
абма+кванне_NNINA
абма+кваннем_NNINI
абма+кванні_NNINR <…>”
}
]
References to sources
Page of the service: https://corpus.by/WordParadigmGenerator/?lang=en
Cross references
- Грамматическое значение слова
- Зализняк А.А. «Русское именное словоизменение» с приложением избранных работ по современному русскому языку и общему языкознанию. – М. : Языки славянской культуры, 2002. – I-VIII. – 752 с.
- NooJ on Wikipedia
- Official NooJ page
- The user dictionary of Belarusian language of the service «Spell Checker» S2016_01
- The user dictionary of Belarusian language of the service «Spell Checker» S2016_03
- Слоўнік беларускай мовы. Арфаграфія. Арфаэпія. Акцэнтуацыя. Словазмяненне / пад рэд. М.В. Бірылы. – Мінск, 1987.
- Зяноўка, Я.С. Стварэнне базы незафіксаванай нарматыўнымі крыніцамі лексікі праз corpus.by / Я.С. Зяноўка // Беларуская граматыка: ад Браніслава Тарашкевіча да сучаснасці : зборнік матэрыялаў Міжнароднай навуковай канферэнцыі (Мінск, 19–20 студзеня 2017 г.) / Нац. акад. навук Беларусі, Цэнтр даслед. беларус. культ., мовы і літ-ры, Ін-т мовазнаўства імя Якуба Коласа. — Мінск : Чатыры чвэрці, 2017. — C. 84-90.
- Zanouka, E. The Enlargement of Electronic Lexical Database by Computational On-line Free System / E. Zanouka // Открытые семантические технологии проектирования интеллектуальных систем = Open Semantic Technologies for Intelligent Systems : материалы междунар. науч.-техн. конф. Вып. 1 (Минск, 16-18 февраля 2017 г.). / редкол. : В. В. Голенков (отв. ред.) [и др.]. — Минск : БГУИР, 2017. — C. 179-182.
- Гецэвіч, Ю.С. Інтэрнэт-сістэма генерацыі парадыгмаў слова для папаўнення электронных граматычных слоўнікаў / Ю.С. Гецэвіч, В.В. Варановіч, С.І. Лысы, І.В. Рэентовіч, Я.С. Качан // Международный конгресс по информатике: информационные системы и технологии=International Congress on computer science: Information systems and technologies / БГУ; под ред. С.В. Абламейко. — Минск, 2016. — C. 584-588.
- Hetsevich Y. Semi-automatic Part-of-Speech Annotating for Belarusian Dictionaries Enrichment in NooJ / Yu. Hetsevich, V. Varanovich, E. Kachan, S. Lysy, I. Reentovich // NOOJ 2016 International Conference. Book of Abstracts. June 9-11, 2016, Čzeské Budĕjovice, Czech Republiс / ed. Jan Radimský. — Čzeské Budĕjovice, University of South Bohemia in Čzeské Budĕjovice 2016. – P. 47-48.
- Hetsevich, Y. Semi-automatic Part-of-Speech Annotating for Belarusian Dictionaries Enrichment in NooJ / Y. Hetsevich, V. Varanovich, E. Kachan, I. Reentovich, S. Lysy // Automatic Processing of Natural-Language Electronic Texts with NooJ: 10th International Conference, NooJ 2016, České Budějovice, Czech Republic, June 9-11, 2016, Revised Selected Papers / ed. L. Barone, M. Monteleone, M. Silberztein. — Springer, 2017. — P. 101-111.