How do I search many text files for a string extremely quickly?

Question

I'm working on a program for my debate team, and one of the features of it will be that will search through text files for certain keywords. Since there is always a limited time to prepare speeches in debate, speed is my absolute top priority, but the methods of searching that I've tried so far aren't fast enough. The fastest way I've tried was using grep to search each of the files and it technically works, but there are about 2500 files for it to search through, so even though it takes like 5 milliseconds per file, that time adds up really quickly when searching for multiple keywords, or searching for different things as the user would need them.

What I really need is a way to perhaps ensure that my program wont be searching through every document when it's searching or something that would essentially cut down the number of documents it has to look through. Does anyone know if something like that is possible? Or if not, could anyone point me in the direction of something to research that would cut down the search time in other ways?

https://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english — Elliott Frisch, Mar 28 '18 at 20:51
categorize the files into subsets so you don't have to search all 2500+ files — RAZ_Muh_Taz, Mar 28 '18 at 21:07

score 0 · Answer 1 · edited Mar 29 '18 at 07:50

I think you are looking for text search engine. I believe Apache Lucene will help you. What you can do is to create an index of all your files, based on the content of these files. Then you can quickly search over that index for interesting words and sentences so the Lucene will tell you in which file is that word/sentence best match. The index should be stored in a file so you don't have to re-create it every time you start searching, but only extend it when the new document comes. Lucene will do even more for you because it can search for similar words (like google does). Describing the Lucene engine usage is I think out of the scope of this short answer, but I believe you will find the nice intro follow this link: http://www.lucenetutorial.com/sample-apps/textfileindexer-java.html

score 0 · Answer 2 · answered Mar 29 '18 at 11:05

Either use Lucene or some kind of index as stated by Vicctor.

Or, see other grep like solutions:

ignore some files if possible
Fastest possible grep <- Interesting
https://beyondgrep.com/feature-comparison/

Or if you want to learn how to code, try doing it yourself !

How do I search many text files for a string extremely quickly?

2 Answers2