1

Utilizing REGEX pattern:

[^?!.\s][^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]

to match an entire sentence with the above-included words, even if the sentence spans multiple lines.

However, I've found that if the word of interest is the first in the sentence, it will not match.

For example: The bird is dead. Will Match. Dog days are over. Will Not. Often the sentences I'm looking for are incomplete grammatically as the second listed, but follow a beginning capitalization and followed by period structure.

  • Glad that worked for you. Note there can also be a way with a lookahead, but it will probably require a more resource consuming pattern. – Wiktor Stribiżew Dec 08 '21 at 09:49

1 Answers1

0

You can use

(?=\s)[^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]
\b[^?!.]*?\b([Cc]at|[Dd]og|[Bb]ird)\b[^?!.]*[.?!]

In the first regex, the first matched char MUST be a non-whitespace char because the (?=\s) is a positive lookahead that matches a location that is immediately followed with a whitespace char.

The \b in the second variant is more specific and matches a position between a start of string/non-word char and a word char, or between a word char and a non-word char/end of string.

Note that in JavaScript \b word boundary is not Unicode-aware, and if you need full Unicode word boundary support, you will need a workaround, see Replace certain arabic words in text string using Javascript.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563