Language Processing @ Work
The NLP Laboratory of Diogenes

ALL ACTIVITIES OF THE LAB HAVE BEEN TRANSFERED TO WWW.SLOVNIK.BG

We are trying to make popular the result of a long standing effort in the area of Bulgarian language morphology. Our understanding is that such knowledge should not be a property of a single company, laboratory or group of people.
 
Important note: We are not the only or even the best laboratory working in this direction. So, we have tried to make an index of the other groups working on similar problems. They can be found here.

 
The Product

In fact, it is not a single product. We prefer to think about what we do as a lexical knowledge base, a huge amount of information about the morphology of the Bulgarian language. It can be incorporated within the appropriate software interfaces for each specific task.

So, lets first say what is being kept in the knowledge base:

What are the basic grammatical operations/services provided: As a standard view tool for our linguistic knowledge base, we propose a Windows 95/98/NT program, called Slovnik, which gives access to the above mentioned services in an easy to use graphical user interface. Here is a snapshot of what it looks like. Go to Availability section to see how to get one. This program can be seen as a computer version of the Dictionary.

Currently we provide a relatively small number of additional services over text files:

Our intention is to develop more and more such services, therefore we are open to all kinds of comments and ideas. However, we are sure that there are many things we can not, or we have not planned to do for the time being. For example, such things as automated translation, syntactic analysis, natural language understanding and many others. 

Slovnik was developed with Borland C++ 5.02, making use of OWL NExt. The basic services (the morphology engine) are written in ANSI C++ in order to allow easy portability.

Availability

The Slovnik application is available via e-mail for the price of 80 USD. There are special discount for non-commercial licenses - 30 USD for academic organizations and 40 USD for personal users. Please, contact us on diogen.at.diogenes.bg for details about payment and discounts. A fully functional demo version is available without any restrictions or conditions. It works with a restricted lexicon (about 2 500 base forms) and can be downloaded directly.

Please, contact us for any special services and processing which can be developed under additional agreement.
 
Coming Soon

Our team continuously develops and improves both the lexical knowledge base and the software. Here are listed only the most significant and immediate results that can be expected.

The Team

Everything here became possible, due to the joined expertise and hard work of four men:

The Story

What we present here is rooted in the work on the Bulgarian Language Pronunciation, Spelling and Punctuation Dictionary, which will be called simply "The Dictionary". Here you can learn more about this book, see it and even buy it.

The Dictionary is a special one. It was compiled with the help of a lexical knowledge base (LKB) consisting of a huge amount of formal, strict and precise declarative, logical statements about the natural language. Actually, the main section of the dictionary, the lexicon (in Bulgarian, "Slovnik"), was generated automatically from the LKB.

This approach gives an obvious twofold advantage:

The LKB itself was designed by Kiril Simov in the good traditions of Artificial Intelligence. In the development of the underlying formal model of the Bulgarian language, Kiril has used two sources of expertise. The first one is his previous work on similar projects with Elena Paskaleva, Mariana Damova, Tanya Avgustinova and Milena Slavcheva. The second one, but not less important, is his current cooperation with Dimitar Popov - the "non-computational domain expert", lexicographer, author of several dictionaries. The result of their work has the advantages of the "computational" approach, perfected by the view point of traditional linguistics.

The "real work" was done by Mr. Popov in cooperation with Mrs. Vidinska - they "encoded" and verified the Bulgarian morphology into the LKB, using the software shell developed by Kiril Simov.

The above is the history of our linguistic knowledge base: the pure knowledge, the scientific result, the abstract model of the language. In its natural form, it is as useful for real-world NLP applications, as the Relativity theory is useful for the Space Shuttle project :-)

Lets start talking about the software tools. When we started searching for a way to use the LKB for "industrial" purposes, we found ourselves facing whole new world, with its own rules and regulations. Here we list only some of the major problems we had to resolve in order to make possible the development of a set of software tools. We currently have:

All of the above "userization" we owe to Ognjan Chernokojev, who created the finite state automata representation from scratch, developed the algorithms for its use and manipulation and made all the C++ implementation. Lets finish the story with a laudatory detail that Ognjan has about ten years of professional experience in both system and application programming in C and C++.
 
Back to the Diogenes Home