|
|
ALL ACTIVITIES OF THE LAB HAVE BEEN TRANSFERED TO WWW.SLOVNIK.BG
We are trying to make popular the result of a long standing effort in
the area of Bulgarian language morphology. Our understanding is that such
knowledge should not be a property of a single company, laboratory or group
of people.
![]() |
Important note: We are not the only or even the best laboratory working in this direction. So, we have tried to make an index of the other groups working on similar problems. They can be found here. |
| The Product |
In fact, it is not a single product. We prefer to think about what we do as a lexical knowledge base, a huge amount of information about the morphology of the Bulgarian language. It can be incorporated within the appropriate software interfaces for each specific task.
So, lets first say what is being kept in the knowledge base:
Currently we provide a relatively small number of additional services over text files:
Slovnik was developed with Borland C++ 5.02, making use of OWL NExt. The basic services (the morphology engine) are written in ANSI C++ in order to allow easy portability.
| Availability |
The Slovnik application is available via e-mail for the price of 80 USD. There are special discount for non-commercial licenses - 30 USD for academic organizations and 40 USD for personal users. Please, contact us on diogen.at.diogenes.bg for details about payment and discounts. A fully functional demo version is available without any restrictions or conditions. It works with a restricted lexicon (about 2 500 base forms) and can be downloaded directly.
Please, contact us for any special services and processing which can
be developed under additional agreement.
| Coming Soon |
Our team continuously develops and improves both the lexical knowledge base and the software. Here are listed only the most significant and immediate results that can be expected.
The basic service that the engine provides is to give the list of all
forms of the word, requested by any form. It is a multithread piece of
software, that allows it to work in a smooth and reliable fashion when
accessed from multiple users at the same time.
Now we have about 20 000 base forms in Russian. As a base we are using Zaliznjak's Grammatical dictionary of Russian Language which contains about 100 000 words. We expect to have the same coverage when we complete the development of the Russian lexical knowledge base.
| The Team |
Everything here became possible, due to the joined expertise and hard work of four men:
| Dimitar Popov - lexicographer, Institute of Bulgarian Language; |
| The Story |
What we present here is rooted in the work on the Bulgarian Language Pronunciation, Spelling and Punctuation Dictionary, which will be called simply "The Dictionary". Here you can learn more about this book, see it and even buy it.
The Dictionary is a special one. It was compiled with the help of a lexical knowledge base (LKB) consisting of a huge amount of formal, strict and precise declarative, logical statements about the natural language. Actually, the main section of the dictionary, the lexicon (in Bulgarian, "Slovnik"), was generated automatically from the LKB.
This approach gives an obvious twofold advantage:
The "real work" was done by Mr. Popov in cooperation with Mrs. Vidinska - they "encoded" and verified the Bulgarian morphology into the LKB, using the software shell developed by Kiril Simov.
The above is the history of our linguistic knowledge base: the pure knowledge, the scientific result, the abstract model of the language. In its natural form, it is as useful for real-world NLP applications, as the Relativity theory is useful for the Space Shuttle project :-)
Lets start talking about the software tools. When we started searching for a way to use the LKB for "industrial" purposes, we found ourselves facing whole new world, with its own rules and regulations. Here we list only some of the major problems we had to resolve in order to make possible the development of a set of software tools. We currently have:
|
|