Language Packs

Language Packs are designed for software developers to extend the performance of their dtSearch Engine powered applications

Language packs include:

Stemming rules and noise-word files

dtSearch products are supplied with stemming rules and a noise-word file for English(US). Stemming is the only search expansion option which is ‘on’ by default in the dtSearch end-user products; the reason for this is that stemming is almost always useful when making a search, and adds little to the time required to make a search.

Unlike some other search engines, dtSearch applies stemming at search time, there is no need to build indexes specifically to apply stemming and no need to build separate indexes for each language in use.

The problem

The dtSearch English stemmer will find plurals and many other word variations; for example a search on print will also find prints, printers, printing, printed, printable. However, if you are searching documents written in other languages, the English stemming rules will cause you to miss many word variations which do not occur in English (e.g. verb and noun changes with gender), and words which are unrelated may be found in error.

In addition, the English noise word list, which is designed to remove unwanted English words from your index to keep the index size small, is not suitable for other languages; your indexes may contain many foreign words which will not be useful in searches and will add to the size of your indexes. The common English noise word the will stop you from getting results if you were searching French  documents for tea!

The solution

Use language specific files in place of the default US English files. Language Extension Packs contain files for many languages in Unicode format, see list of supported languages

LEP500 Language Extension Pack includes “Russian Plus” stemming rules; these combine Cyrillic and Latin rules to improve search recall in document collections containing Russian and another language. The language pairs supported are:

Russian plus Czech
Russian plus Estonian
Russian plus Finnish
Russian plus German
Russian plus Greek
Russian plus Hungarian
Russian plus Latvian
Russian plus Lithuanian
Russian plus Polish
Russian plus Slovak
Russian plus Swedish

LEP500 Language Extension Pack can also be licensed for large volume use or wider distribution in your own application – see the Developer Price List

* Stemming Selector 4 and User Thesaurus Plus can also be licensed separately on a per-user basis for use with dtSearch Desktop or dtSearch Network, however the license is for end-user use only.
For distribution with an application using the dtSearch Engine an LEP500 developer license is required.

LEP500 Language Extension Pack
3-Server license $375 including one year of support & updates

License covers install on up to 3 servers or workstations for use with dtSearch Engine or dtSearch Web, OR up to 15 workstations for use with dtSearch Desktop or Network.

For wider distribution contact us for a quotation.