Stemming Language Selector

Stemming Language Selector is supplied as part of a Language Extension Pack (LEP500 series) for use by developers.

Language specific stemming is essential to obtain the high recall demanded by professionals in the legal, law enforcement, intelligence, and other serious research fields for non-English data.

  •  Generates the stemming rules on demand. Set the target path to a writable path to the stemming file.
  •  To change the stemming language simply select the language from the drop down list before making a search.
  •  Generate multiple stemming language rules in the same format as the previous LEP400 series
  • “Always on top” option for convenience.

Language List

Western European (Latin script)

Danish
Dutch
English
Finnish
French + English*
German
German + English*
Italian
Norwegian
Portuguese
Spanish
Swedish

Eastern European

Belarusian
Bosnian
Bulgarian
Croatian
Czech
Estonian
Greek
Hungarian
Latvian
Lithuanian
Polish
Romanian
Russian
Russian Plus***
Serbian**
Slovak
Slovenian
Turkish
Ukrainian
Uzbek**

NOTES

As from 4th Jan 2022 SLS500 is no longer available for use with dtSearch end-user products.

Notes to Language List:

* Unique bilingual French/English and German/English stemming and noise word files which enables search expansion on indexes and documents containing a mix of French and English, or German and English text.

** Supports Cyrillic and Latin scripts simultaneously.

*** “Russian Plus” stemming rules combine Cyrillic and Latin rules to improve search recall in document collections containing Russian and another language. These files are exported using a command line.
The language pairs supported are:

Russian plus Czech
Russian plus Estonian
Russian plus Finnish
Russian plus German
Russian plus Greek
Russian plus Hungarian
Russian plus Latvian
Russian plus Lithuanian
Russian plus Polish
Russian plus Slovak
Russian plus Swedish