-2

I need help matching the two words "hello" and "hope" in mystring, but only counting the first occurrence in the string. The max distance they could be from each other is 5 words. Appreciate any help!

mystring = "hello bob nice weather hope you have a good day. hello jan hope weather is nice"

This is what I have so far. I'm wanting the result to only catch the first occurrence of "hello" and "hope" and stop matching afterwards.

pattern = re.findall('\bhello(?:\W+\w+){0,5}\W+hope\b', mystring)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Dcook
  • 1
  • 2
  • Does it have to be regex? Probably easier to solve with tokenization in the mix. – Matt L. Sep 14 '20 at 02:21
  • Yes it can be tokenization @MattL. – Dcook Sep 14 '20 at 02:22
  • In that case, try it with tokenization, or simply remove the punctuation and `split` the input. The regex is hard to read and maintain, as you've already learned. – Prune Sep 14 '20 at 02:29
  • Could you help with the tokenization? @Prune – Dcook Sep 14 '20 at 02:30
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). "Show me how to solve this coding problem?" is off-topic for Stack Overflow. You have to make an honest attempt at the solution, and then ask a *specific* question about your implementation. – Prune Sep 14 '20 at 03:03
  • Can you give an example? – Johnny Sep 14 '20 at 03:21
  • `"\b"` is a backspace char. You need `r"\b"`. – Wiktor Stribiżew Jan 23 '21 at 21:59

1 Answers1

1

I don't know how to do in a single line of code with RegEx, but you can do part of it with regex and have an additional line of code using list comprehension.

mystring = "hello bob nice weather hope you have a good day. hello jan hope weather is nice"
pattern = re.findall('hello(?:\W+\w+){0,5}\W+hope', mystring)
pattern

['hello bob nice weather hope', 'hello jan hope']


new_pattern = [x for x in pattern if len(x.split()) == 5]
new_pattern

['hello bob nice weather hope']
David Erickson
  • 16,433
  • 2
  • 19
  • 35