0

I'm trying to search for all occurrences of a group of words in a paragraph with the following regex:

'(?i)\W(the|if|but|then|is|with)\W'

The result is returned with the issue that some occurrence of a the word are missed.
For example in a test paragraph with 3 occurrences of the word 'the', this regex finds only the second occurrence missing the first and third one.

What is the problem with this regex to find all the words in a paragraph?

afshin
  • 1,783
  • 7
  • 22
  • 39
  • You should use a word boundary, `'(?i)\b(the|if|but|then|is|with)\b`, to enable consecutive matches. – Wiktor Stribiżew Apr 12 '20 at 15:45
  • when I replace \W with \b the regex doesn't find any words. The returned matches are empty. – afshin Apr 12 '20 at 15:52
  • Ah, so you run it in Python. Use the raw string literal. `r'(?i)\b(the|if|but|then|is|with)\b'`. See [Python regular expression match whole word](https://stackoverflow.com/questions/15863066/python-regular-expression-match-whole-word) – Wiktor Stribiżew Apr 12 '20 at 15:56
  • 1
    thanks that fixed it. I had missed r prefix – afshin Apr 12 '20 at 15:59

0 Answers0