Part-of-Speech Tagger


Service “Part-of-Speech Tagger” allows the user to find out what part of speech belongs to a certain word online. The text in the Belarusian or Russian language should be given as input and as the output the user receives a list of words with indicated part of speech for every word of the text.

 

Basic terms and concepts

Parts of speech — word classes, which are characterized by common values, morphological traits, syntactic role. Part of speech can be allocated only on the basis of a set of specified criteria. Attention is paid to the following factors in determining the unit:

  • what it usually manifests (object, action, quality, etc.).;
  • in which grammatical form it may occur;
  • which word-formation means are typical for it;
  • what functions it performs in a sentence [1].

 

Practical value

The exact definition of the parts of speech of words in the text is important for understanding the meaning of a particular word, in case if the understanding depends on the part of speech. For example, the service can be used by translators when there are difficulties with the translation of a specific text with the word, which may belong to different parts of speech. It can also be used in the translation programs.

 

Service features

The service can use a number of dictionaries, each of which the user can choose by placing or removing the checkbox mark next to the name of the dictionary.

 

UI description

UI of the service is shown on the Figure 1.

Figure 1. UI of the service “Part-of-Speech Tagger”

On the service page, the user can enter text in which there should be determined the identity of the words to the parts of speech on one of the two languages (Belarusian, Russian). Also separately, you can add the known words, for which belonging to a particular part of speech can be accurately determined.

UI has the following areas:

  • text input area;
  • input area for known words with parts of speech, to which they belong;
  • output area of text in the form of words with parts of speech, to which the words refer to
  • output area for unknown words.

For receiving the words list with parts of speech, to which they belong, you need to click on “Show the list of words with parts of speech!”.

 

Use case of work with the service

  1. Enter text in the input field on the service page.
  2. In the “Known words” enter all known words with their parts of speech through the symbol “_” (Figure 1).
  3. In the selection of dictionaries, area indicates the necessary dictionaries (Figure 1).
  4. Click “Show the list of words with parts of speech!” to obtain the results (Figure 2).

Figure 2. The results of the service parts-of-speech identification

 

Access to the service via the API

To access the service «Part-of-Speech Tagger» via the API, you should send an AJAX-request (type: POST) to the address https://corpus.by/PartOfSpeechTagger/api.php. With an input array data the following parameters are passed:

  • text — arbitrary input text.
  • knownList — a list of words with user-defined parts-of-speech.
  • localDelimiter — resulting information separator.
  • dictionaryNames — marker for showing dictionaries from which information is taken.
  • horizontalFormat — marker for organizing all the resulting information in one line; if the marker is not marked, the information on each word is supplied in separate lines.
  • decodedTags — marker for tags decoding.
  • Markers for dictionaries usage:
    • sbm1987 — «Слоўнік беларускай мовы. Арфаграфія. Арфаэпія. Акцэнтуацыя. Словазмяненне / пад рэд. М.В. Бірылы. – Мінск, 1987»;
    • sbm2012initial —  «Слоўнік беларускай мовы. / навук. рэд. А.А. Лукашанец, В.П. Русак. — Мінск : Беларус. навука, 2012»;
    • noun2013 — nouns by the book «Граматычны слоўнік назоўніка / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • adjective2013 — adjectives by the book «Граматычны слоўнік прыметніка, займенніка, лічэбніка, прыслоўя / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • numeral2013 — numerals by the book «Граматычны слоўнік прыметніка, займенніка, лічэбніка, прыслоўя / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • pronoun2013 — pronouns by the book «Граматычны слоўнік прыметніка, займенніка, лічэбніка, прыслоўя / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • verb2013 — verbs by the book «Граматычны слоўнік дзеяслова / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • adverb2013 — adverbs by the book «Граматычны слоўнік прыметніка, займенніка, лічэбніка, прыслоўя / навук. рэд. В.П. Русак. – Мінск : Беларус. навука, 2013»;
    • zalizniak — «Грамматический словарь русского языка: Словоизменение / А.А. Зализняк. — Москва : Русский язык, 1980. — 880 c.».
    • new — text-to-speech system dictionary;
    • S2016_01S2016_02S2016_03S2017_04S2017_05 — user dictionaries.

Example of AJAX-request:

$.ajax({
   type: “POST”,
   url: “https://corpus.by/PartOfSpeechTagger/api.php”,
   data:{
      “text”: “Груша цвіла апошні грод.”,
      “knownList“: “груша_назоўнік цвіла_дзеяслоў”,
      “localDelimiter”: “_”,
      “dictionaryNames”: 1,
      “horizontalFormat”: 0,

      “decodedTags”: 0,
      “sbm1987”: 1,
      “sbm2012initial”: 1
   },
   success: function(msg){ },
   error: function() { }
});

The server returns a JSON-array with the following parameters:

  • text — input text.
  • result — list of words with information about their affiliation to one or another part-of-speech.
  • unknownWords — list of unknown words for the service.

For example, the following reply will be formed on the above listed AJAX-request:

[
   {
      “text”: “Груша цвіла апошні грод.”,
      “result”: “груша_назоўнік_known
цвіла_дзеяслоў_known
апо+шні_JJMO_sbm1987_апо+шні_JJMA_sbm1987_апо+шні_невядомаяКатэгорыя_sbm2012initial
грод_НевядомаяЧасц
._ЗнакПрыпынку”,

      “unknownWords”: “грод”
   }
]

 

Links to sources

Service page: https://corpus.by/PartOfSpeechTagger/?lang=be

 

Crossed links

  1. Часціны мовы // Вікіпедыя [Electronic resource]. — 2017. Access mode : https://be.wikipedia.org/wiki/Часціны_мовы. — Date of access : 15.03.2017.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.