Service “Lemmatizer” is intended to determine the root forms of words. Any text in the Belarusian language can be an input. The result of the service is a list of words with their root forms, as well as a list of words, the root form of which could not be determined.


Access to the service via the API

To access the service “Lemmatizer” via the API, you should send a POST request to the AJAX-address http://corpus.by/Lemmatizer/api.php. With an array of data parameters an arbitrary input text is passed (text option), as well as a list of user-defined initial forms of words  (option knownList), a separator for resulting information (localDelimiter option), the indication of dictionaries from which a user wants to take the information (parameter dictionaryNames), the token of necessity to transfer the final list in a row (horizontalFormat parameter) and markers of use a particular dictionary.


Input array data elements have the following options:

  • text — arbitrary input text in Belarusian.
  • knownList — a list of words with user-defined initial forms.
  • localDelimiter — delimiter of output information – the character that will separate the words, its root form and the name of the dictionary in the final list.
  • dictionaryNames — token of necessity to show dictionaries from which information is taken.
  • horizontalFormat — token of necessity to organize all the resulting information in one line; If the token is not marked, the information on each word is supplied in separate lines.
  • Tokens of dictionaries usage:
    • sbm1987 — «Слоўнік беларускай мовы. Арфаграфія. Арфаэпія. Акцэнтуацыя. Словазмяненне / пад рэд. М.В. Бірылы. – Мінск, 1987».

Example of AJAX-request:

   type: “POST”,
   url: “http://corpus.by/Lemmatizer/api.php”,
      “text”: “Груша цвіла апошні год. Усе галіны яе, усе вялікія расохі, да апошняга пруціка, былі ўсыпаны буйным бела-ружовым цветам.”,
      “knownList“: “расохі_расоха”,
      “localDelimiter”: “|”,
      “dictionaryNames”: 1,
      “horizontalFormat”: 0,
      “sbm1987”: 1
success: function(msg){ }

The server returns a JSON-array with the input text (text option), the final list of words with information about their root forms (result parameter) and the list of unknown words for service (unknownWords option). For example, the following reply will be formed on the above listed AJAX-request:

      “text”: “Груша цвіла апошні год. Усе галіны яе, усе вялікія расохі, да апошняга пруціка, былі ўсыпаны буйным бела-ружовым цветам.”,
      “result”: “гру+ша|груша|sbm1987

      “unknownWords”: “пруціка”

Example of use of the API — Web-service “Language Identifier via API” (http://corpus.by/LemmatizerViaApi/).


Links to sources

Service page: http://corpus.by/Lemmatizer/?lang=en

Калі Вы знайшлі ў тэксце памылку правапісу, калі ласка, выдзеліце гэты тэкст і націсніце Ctrl+Enter.