-1

I am trying to extract a pattern with two words/terms but only when they occur inside a sentence, in this test case both "pattern" and "sentence". So:

Find the pattern when it is. In the same sentence. Find the pattern when it is in the same sentence.

Should only find a match in last sentence.

As you can see in my Regex101 test, (?P<Capture>pattern.*?sentence).*?\. finds it regardless:

https://regex101.com/r/xDIU5q/2

As far as I understand, I have asked regex to non-greedily match until it finds a period. But this doesn't seem to be limiting it to do so.

user3649739
  • 1,829
  • 2
  • 18
  • 28

1 Answers1

0

https://regex101.com/r/V1aGaY/4

Pattern: ((?<=\s)[^.]+(pattern)[^.]+(sentence)(?=\.)|(?<=\s)[^.]+(sentence)[^.]+(pattern)(?=\.))

Test string: Find the pattern when it is. In the same sentence. Find the pattern when it is in the same sentence. Find the sentence when it is in the same pattern.

This should get you what you need, I believe. This assumes that sentences are delimited by space -- that can easily be modified, just LMK.

aaaa
  • 246
  • 1
  • 9
  • You look ahead finds the dot but the way you have it, it must immediately follow the word sentence. Maybe you don't need to lookahead at all `[^\.]+(pattern)[^\.]+(sentence)[^\.]+\.` – Jerry Jeremiah Aug 24 '20 at 00:01
  • You're right in the sense that the lookahead isn't necessarily needed. I admit fully that they might want to keep the period or full-stop. If that's the case, I'll amend my post. I just didn't really think it was a big deal... – aaaa Aug 24 '20 at 00:07
  • 1
    I wasn't complaining about whether the dot was kept or not. I was just saying that the regex requires the word "sentence" to be the last word in the sentence. – Jerry Jeremiah Aug 24 '20 at 00:22
  • Ah, I see your point. I can update my answer. – aaaa Aug 24 '20 at 00:24
  • 1
    Why escaped [^\.] and class [\s] - they add unnecessary complexity. – TonyR Aug 24 '20 at 05:15
  • Fixed, I think. Thanks. – aaaa Aug 24 '20 at 13:02