2

I want to match certain words in the context of other words, like if I wanted to try and capture a filling when we're talking about sandwiches I could do:

(?:sandwich|toastie).{0,100}(ham|cheese|pickle)

Which would match something like Andy sat down to enjoy his sandwich which, unusally for him, was filled with delicious ham

However this would also capture across "context breaks" such as end-of-sentence punctuation or line breaks e.g. Victorians enjoyed a good sandwich after work. They also enjoyed cheese rolling.. In this context I'd want to negate the match as it crosses a sentence.

So I tried to do (?:sandwich|toastie)(?:\w\. ){0}.{0,100}(ham|cheese|pickle) but that doesn't work. What I'm imagining is something like [^\w\. ] but that isn't right either

Frayt
  • 1,194
  • 2
  • 17
  • 38

2 Answers2

1

The way you are trying to reject the sample string, you need to use a tempered greedy token, instead of the way you are writing, and need to write your regex as this,

(?:sandwich|toastie)(?:(?!\w\. ).){0,100}(ham|cheese|pickle)

Regex Demo

So basically, as you were trying to negate (?:\w\. ) pattern so the match fails, you need to write (?:(?!\w\. ).) instead of just . which would fail the match and the words from those two parenthesis will not get matched across two different sentences.

Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
1

You could make use of a tempered greedy token with a negated character class to assert what is on the right is not any of the listed words, a dot followed by a space or for example a newline:

(?:sandwich|toastie)(?:(?!(?:ham|cheese|pickle|\w\. +|(?:\r?\n|\r))).){1,100}(?:ham|cheese|pickle)

Explanation

  • (?:sandwich|toastie) Match one of the options
  • (?: Non capturing group
    • (?! Negative lookahead to prevent over matching, assert what follows is not
      • (?:ham|cheese|pickle|\w\. |(?:\r?\n|\r)) Match any of the options
    • ). Close negative lookahead and match any character
  • ){1,100} Close non capturing group and repeat 1 - 100 times
  • (?:ham|cheese|pickle) match one of the options

Regex demo

You might consider using word boundaries \b for \b(?:sandwich|toastie)\b and \b(?:ham|cheese|pickle)\b to prevent the words being part of a larger word.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70