0

I tried the answer of this post Regex. Find paragraph containing some word, in my case that would be

((?!\n\n).)*(cat)

, but this don't work.


How can I use PCRE2 regular expressions (PHP >= 7.3) to match all paragraphs in my text that contain the word "cat", where each paragraph is separated by two consecutive line breaks (It is allowed to have one line break in a paragraph but not two)?

For example, if the input text is as follow

Paragraph 1 wepfowfpo
fww efwf

Paragraph 2 wefwf32321
!@d r33tcat54, 333!..

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

Paragraph 4 222cocdo3

Then the desired ouput is

Paragraph 3 4t4t022
-`121231ere3r3cat342232
$ 4t0g cat rdwd203  
$$333

I tried to use something like \n\n.*(?=cat)cat.*\n\n, but this match only those lines contain "cat".

ZENG
  • 111
  • 3

1 Answers1

1

How about splitting the string into paragraphs and matching those containing cat.

preg_grep('/\bcat\b/i', explode("\n\n", $str));

See this PHP demo at tio.run - The word bundary \b prevents from matching tcat5.


If you can't use PHP functions, following a regex-only idea for (?m) multiline mode.

^(?:.+\n)*.*?\bcat\b.*(?:\n.+)*

See this demo at regex101 - Further add i flag to ignore case (also match e.g. Cat).

regex explained
(?m) flag for multiline mode to make ^ match line start too
^(?:.+\n)* at ^ start repeat the (?: non capturing group ) * any amount of times, containing:
.+ greedily match one or more chars up to \n newline - part that matches lines before
(if available, use of atomic group instead non capture can be more efficient here: demo)
.*?\bcat\b.* .*? matches lazily any characters up to \bcat\b (using word bundaries) .* rest of line
(?:\n.+)* matches any remaining lines in the paragraph where .+ prevents to skip over \n\n
bobble bubble
  • 16,888
  • 3
  • 27
  • 46