2

I have been spending time on this regex but I can't get it to work. So I need to match bunch of words in a phrase but if the same word occurs with a set of words, I do not want that to be captured. For example:

phrase: Hi, I am talking about a recall on the product I bought last month. If I recall correctly, I purchased this at your store on august 15th. Can you tell me if I can get a refund on this recall?

Result should match the first recall and the last recall. but it should not match 'If I recall' since those three words together doesn't talk about the product recall.

I tried different variations of this but couldn't get it to work. This matches all 'recall' terms.

(?<!If\sI\srecall).*?(recalls?|recalled).*?(?!If\sI\srecall)

I am using Python 3.10 to test this. Any help would be appreciated.

Karthik
  • 23
  • 3
  • 2
    If the regex becomes too complex: one possible solution would be to first replace all occurrences of `If I recall` by `NOT_INTERESTING`, and look for the strings you are interested in. – Eric Duminil Aug 21 '23 at 21:20

2 Answers2

3

If you want to match the 2 words:

(?<!\bIf\sI\s)\brecall(?:s|ed)?\b

The pattern matches:

  • (?<!\bIf\sI\s) Negative lookbehind, assert not If I to the left
  • \brecall(?:s|ed)? Match one of the words recall recalls recalled
  • \b A word boundary

Regex demo | Python demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
2

You can try (regex demo):

import re

txt = """Hi, I am talking about a recall on the product I bought last month. If I recall correctly, I purchased this at your store on august 15th. Can you tell me if I can get a refund on this recall?"""

pat = re.compile(r"(?<!If I )(?:recalls?|recalled)", flags=re.IGNORECASE)

for phrase in re.findall(r".*?[.?!]", txt):
    if pat.search(phrase):
        print(phrase)

Prints:

Hi, I am talking about a recall on the product I bought last month.
 Can you tell me if I can get a refund on this recall?
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 2
    This solution is great if I want to exclude the phrases that I don't want to match. This may actually come in handy for me in other problems. Thank you! – Karthik Aug 22 '23 at 16:45
  • Thanks. I always forget that it's possible to omit escaping in brackets, as in `[.?!]`. Is there a problem to leave them, e.g. `[\.\?!]`? – Eric Duminil Aug 22 '23 at 21:23
  • Yes, for `.`, `?`, `!` the escape character can be omitted inside brackets. Of course, with `^`, `-`, `]` or `\\` you need to be more careful. More here: https://stackoverflow.com/questions/19976018/does-a-dot-have-to-be-escaped-in-a-character-class-square-brackets-of-a-regula – Andrej Kesely Aug 22 '23 at 21:25