What libraries provide case-insensitive, exact substring matching in Node.js against a large corpus of strings? I'm specifically looking for index-based solutions.
As an example, consider a corpus consists with millions of strings:
"Abc Gef gHi"
"Def Ghi xYz"
- …
I need a library such that a search for "C ge"
returns the first string above, but a search for "C ge"
(note the multiple spaces) does not. In order words, I'm not looking for fuzzy, intelligent, full-text search with stemming and stop words; rather, the most simple (and fast) exact substring matcher with an index that works on a large scale.
Solutions in JavaScript are welcome, and so are solutions in C (as they can be turned into a native Node.js module). Alternatively, solutions in other programming languages such as Java are also possible; they can be used through the command-line. Preferably, solutions are disk-space-bound rather than memory-bound (e.g., rather not Redis), and they should write an index to disk so that subsequent startup time is low.
The problem with most solutions I found (such as the ones here), is that they are too intelligent. I.e., they apply different kinds of stemming or normalization, so the matches are not exact.
Thanks in advance for your help!