PHP

Nearest Words Finder

The «Nearest Words Finder» service processes sequences of characters separated by indentation characters and compares them with user dictionary sequences. The result of the service work is an HTML table with «word – match» pairs, where the word is the original sequence of characters, the match is the dictionary sequence of characters closest to the […]

Read more...

Grammatical Dictionary Processor

The «Grammatical Dictionary Processor» service allows the user to receive previously loaded and converted to the required format lexicographic data of the grammar dictionary in the form of an HTML table, and to receive SQL instructions for creating a database that contains the entered information in a structured form.   Basic terms and concepts Parsing […]

Read more...

UDC Code Finder

The «UDC Code Finder» service allows the user to get a list of Universal Decimal Classification codes, in the descriptions of which one or another word occurs. The input of the service is the word that needs to be found. At the output, the user receives the following information about the UDC classes, where the entered […]

Read more...

Phonetic Minimizer

The “Phonetic Minimizer” service allows the user to form a minimized set of sentences covering all phonetic units present in the original corpus basing on a corpus of texts in Belarusian. The service receives the text entered by the user or the user selected text base. The user can define two minimization parameters: the base […]

Read more...

Service Demonstrator With Authorization

«Service Demonstrator With Authorization» is an implementation of the authorization mechanism based on “Service Demonstrator”. This is a ready-made open-source groundwork for creating future services of the Internet platform for text and speech processing www.corpus.by. The service demonstrates the possible principle of the work of future services created on its basis.

Read more...

Romanizator

The «Romanizator» service is designed to convert a Belarusian-language text written in Cyrillic characters into a text written in Latin characters. The service receives a Belarusian-language text recorded in the Belarusian Cyrillic alphabet. This text may have the form of personal names, geographical names or other information. After processing the text, at the output the […]

Read more...

Phonetic Phenomena Searcher

Service «Phonetic Phenomena Searcher» is used to identify a particular phonetic phenomenon in the entered text.   Access to the service via the API To access the service «Phonetic Phenomena Searcher» via the API, you should send an AJAX-request (type: POST) to the address https://corpus.by/PhoneticPhenomenaSearcher/api.php. With an input array data the following parameters are passed: text — […]

Read more...

Alphabetical Subject Index Generator

The «Alphabetical Subject Index Generator» service makes it possible to convert the text of universal decimal classification tables (UDC) into an alphanumeric subject index (ASI). The UDC tables in the format “class code – class description” through a tab are sent to the service as an input, one class per line. The result of the […]

Read more...

N-gram Frequency Counter

Tools “N-gram Frequency Counter” for counting the frequency of n number of elements frequency. Symbols and tokens can act as elements, as well as words and even regular expressions. Arbitrary characters is an input for the service. The result of the service is a list of n-grams with their frequencies in the input text.   […]

Read more...

Lemmatizer

The Lemmatizer service is designed to determine the initial forms of words. It receives an arbitrary text in Belarusian or Russian. The result of the service is a list of words of the input text with their initial forms, as well as a list of words whose initial form could not be determined. The general […]

Read more...

Language Identifier

The service «Language Identifier» is designed for identification of the language of the arbitrary text submitted to the input. At the moment, the service recognizes 5 languages: Belarusian, Russian, Ukrainian, English and German.   Basic terms and concepts «Language Identifier» (or language guessing) – a problem related to the field of natural language text processing, […]

Read more...

Udc Decoder

The «UDC Decoder» service allows the user to receive the decoding of Universal Decimal Classification codes. The input of the service is the UDC code which must be decrypted. At the output, the user receives information on the entered code: class code; class description in English; class description in Belarusian.   Basic terms and concepts […]

Read more...

Tokenizer

(Беларуская) Вылучае ў тэксце ўсе токены.

Read more...

Tag Identifier

Service «Tag Identifier» is designed to obtain supporting information about the tags in the text. At the entrance of the service, you can submit arbitrary text or a sequence of characters with tags. At the output, the user receives the following supporting information about the tags in the text: a list of tags (single or […]

Read more...

Specialized Phonetic Dictionary

The service «Specialized phonetic dictionary» is designed to display the transcription of word lists of specialized topics. At the moment, the dictionary contains words for the Russian language, these words were used in the everyday and conversational thematic domain. On the service page there is a list of letters in alphabetical order, from which words with a […]

Read more...

Unknown Words Processor

The service «Unknown Words Processor» is intended to supplement the existing dictionaries of a speech synthesizer with words that are most often defined as unknown or incorrect in the services “Text-to-Speech Synthesizer”, “Spell Checker”, “Voiced Electronic Grammatical Dictionary”.

Read more...

Speech Duration Predictor

The service «Speech Duration Predictor» allows the user to know the approximate time of the online speech. An electronic text is delivered to the service entrance in Belarusian, English or Russian, the text can be entered manually or copied. At the output, the user receives the result in the form of an approximate speech duration […]

Read more...

Talking Head Synthesizer

Service «Talking Head Synthesizer» provides a visualization of the text entered by the user. An electronic text is sent to the service input, the service processes the input text and forms a video file with an animated head that says the entered phrase. «Talking head» conveys the facial expressions of a human head and the synthesized […]

Read more...

Part-of-Speech Tagger

Service “Part-of-Speech Tagger” allows the user to find out what part of speech belongs to a certain word online. The text in the Belarusian or Russian language should be given as input and as the output the user receives a list of words with indicated part of speech for every word of the text.   […]

Read more...

Image Cropper

Service “Image Cropper” was designed to crop images quickly and easily in rectangular or round format. The input image or several images are given to the service in .jpg or .png format after processing the output user can download an input image as a rectangle of the size 800×533 and circle of 100 pixels diameter […]

Read more...

Service Demonstrator

“Service Demonstrator” is the ready stepping stone with an open source framework for building the Internet services for processing text and speech at www.corpus.by. Service also shows a principle of the work of future services based on it. The text is the input for the demonstrated service, the user selects any checkbox, an option button, and […]

Read more...

Word Paradigm Generator

The service «Word Paradigm Generator» allows to get the word paradigm. The service receives a word or its word forms at the input, then the service searches for a paradigm in the dictionary, and in case of the absence of a ready-made paradigm, at the output the user receives the generated paradigm of the entered […]

Read more...

Thematic Speech Recognizer

The «Thematic Speech Recognizer» service allows the user to turn speech into electronic text online. The phonogram of speech words of thematic domains no larger than 20 Mb in size is given to the service entrance; the service gives the recognized electronic phonogram text at the output. The phonogram can be selected from the given examples, […]

Read more...

Table Processor

The service “Table processor” is intended for the conversion of initial data into table view and for resulting table processing. The external interface of the service is shown in Figure 1. At the service entrance, you can submit data in a specific format that interests the user. To get a table with user information, you need to […]

Read more...

Text-to-Speech Synthesizer

Internet service “Text-to-speech synthesis” (TSS) is a high-quality tool for text processing. The system is based on free and the most widespread on the Internet scripting PHP programming language and serves to voice Belarusian or Russian  texts inputted by a user. Text-to-speech Synthesizer processes a text automatically and forms an audio file that a user can […]

Read more...

Sound Recorder

Service “Sound Recorder” allows you to record sound directly in the browser, without using any additional programs. The interface is very simple, but gives the minimum set of necessary tools for work. The service can record a random sound or record through your microphone, voice it, will allow you to download to a computer or […]

Read more...

Homograph Identifier

The service «Homograph Identifier» is disigned for recognition and highlighting of homographs in the text. The service receives electronic text, the user receives a list of homographs found in the text with their detailed data as the results of processing.   Basic terms and concepts Homonymy – the coincidence of words or their forms with […]

Read more...

Voiced Electronic Grammatical Dictionary

The service «Voiced Electronic Grammar Dictionary» is designed to obtain information on the correct spelling, pronunciation of words, allows you to see the transcription (classical form and in IPA format), also learn a detailed description of a word about its belonging to a particular part of speech (Figure 1). The service automatically generates a sound file […]

Read more...

Alphabetizer

The «Alphabetizer» service is designed for getting text strings in alphabetical order. The service receives an arbitrary text or a sequence of characters at the entrance, in which 1 line = 1 unit for alphabetical ordering, and the service will put the input text in alphabetical order by lines. It will also allow you to […]

Read more...