I have a code that works but is extremely slow. This code determines whether a string contains a keyword. The requirements I have need to be efficient for hundreds of keywords that I will search for in thousands of documents.
What can I do to make finding the keywords (without falsely returning a word that contains the keyword) efficiently?
For example:
String keyword="ac";
String document"..." //few page long file
If i use :
if(document.contains(keyword) ){
//do something
}
It will also return true if document contains a word like "account";
so I tried to use regular expression as follows:
String pattern = "(.*)([^A-Za-z]"+ keyword +"[^A-Za-z])(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(document);
if(m.find()){
//do something
}
Summary:
This is the summary: Hopefully it will be useful to some one else:
- My regular expression would work but extremely impractical while working with big data. (it didn't terminate)
- @anubhava perfected the regular expression. it was easy to understand and implement. It managed to terminate which is a big thing. but it was still a bit slow. (Roughly about 240 seconds)
- @Tomalak solution is abit complex to implement and understand but it was the fastest solution. so hats off mate.(18 seconds)
so @Tomalak solution was ~15 times faster than @anubhava.