I have a set of sentences in a file ( say 500
). I am trying to find whether a pair of words ( say word1
and word2
) is present in any of the sentences. I have 58000
such pairs of words.
For example, let the set of sentences be:
I am a good boy. He is a bad boy. I am a very good boy.
Pair of words to search:
am
, good
So this should return the first and last sentence as output.
I am using the following regex:
for match in re.finditer(r'([ A-Za-z0-9]*)\b{string1}\b([^\.!?]*)\b{string2}\b([^\.!?]*[\.!?])'.format(string1=word1, string2=word2), sentence_set.lower(), re.S):
This statement is doing the work but taking a lot of time; more than 8 minutes.
Then I removed the regex part and used multiple loops and split each sentence, then checked whether the 2 words are present or not. This took much less time, less than 2 minutes.
So, I felt that regex is very slow at sometimes. Is that true ? Is there any way to improve the speed ?