New solution
Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split
, plus implode
would be the simpler solution:
implode('', preg_split('~<div class="foo">.*?</div>~is', $text))
Demo on ideone
Old solution
I'm not sure whether this is a good idea, but here is my solution:
~(.*?)(?:<div class="foo">.*?</div>|$)~is
Demo on regex101
The result can be picked out from capturing group 1 of each matches.
Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.
The idea is to rely on the fact that lazy quantifier .*?
will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*?
will not be inside <div class="foo">.*?</div>
.
The div tag is matched along in each match in order to advance the cursor past the closing tag. $
is used to match the text after the last matching div.
The s
flag makes .
matches any character, including line separators.
Revision: I had to change .+?
to .*?
, since .+?
handle strings with 2 matching div next to each other and strings start with matching div.
Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.