I am trying to extract 3 words before and after a given word using regex in python. It works well for most of the cases, but the issue occurs when there are 2 of the same given words within the 3 words region as per the code snippet below (The given word is "hello").
new_text = "I am going to say hello and hello to him"
re.findall(r"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,3})(hello)((?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,3})", new_text)
Expected Output:
[('going to say ', 'hello', ' and hello to'), ('say hello and ', 'hello', ' to him')]
Actual Output:
[('going to say ', 'hello', ' and hello to')]
From my research, it is due to regex consuming the words that it matches and therefore it is not able to process my second "hello". I will need to capture the region as I will be doing additional processing to it.
Any advice on how to proceed will be greatly appreciated (Regex or non-regex).
Thanks!