Indexing and Search Techniques
dtSearch Desktop/Network is a powerful search tool used by professionals for a wide variety of tasks, these short training courses aim to show you how to carry out some tasks which often prove tricky even for experienced search professionals. All are in PDF format.
T201 Searching for money
How to use the User Thesaurus Plus add-on product to improve precision and recall when searching for sums of money in specific currencies.
T205 Search for translators
How to index and search Microsoft TBX and CSV glossaries. Although aimed at translators, the techniques will be found useful by other professionals that need to index very large text files.
T206 Searching for Numeric Patterns
How to search for numeric patterns such as IPv4 or IPv6 addresses using wildcards or regular expressions in a one time search or to find all such patterns in a collection of files. Also how to use the User Thesaurus Plus 1.2 add-on product (Optional – Includes IPv4, IPv6 & Bitcoin macros)
T207 Identifying duplicates
How to use the Search for List of Words function, use MD5 hashing to identify duplicate documents, and copy selected files to another location. Needs dtSearch Desktop version 7.88 or later.
T209 Searching for People
How to search for people by name by using proximity operators, soundex, user thesaurus and by using a “list of words”. These are common processes used in genealogy, investigative journalism, anti-money laundering and law enforcement. (Optional: Includes how to use User Thesaurus Plus add-on for searching for people by name)
Useful Links
Searching Microsoft TBX files with dtSearch (YouTube Video)
dtSearch add-ons for translators (IDMS.com)
Developer links
Code Project: Developer articles
Articles
Language matters – cyrillic search
Unicode and Text Retrieval (dtSearch Corp)
The Art of the text query (dtSearch Corp)
Crossing the full text fielded data divide
Unindexed v indexed searching and more (dtSearch Corp)
The Best Kept Secrets to Using Keyword Search Technologies
Part 1: A Comparison of dtSearch & Lucene
Edited corrections are highlighted