PHP - Preg match reversal?

Question

How do you inverse a Regex expression in PHP?

This is my code:

preg_match("!<div class=\"foo\">.*?</div>!is", $source, $matches);

This is checking the $source String for everything within the Container and stores it in the $matches variable.

But what I want to do is reversing the expression i.e. I want to get everything that is NOT inside the container. I know there is something called negative lookahead, but I am really bad with Regular expressions and didn't manage to come up with a working solution.

Simply using ?!

preg_match("?!<div class=\"foo\">.*?</div>!is", $source, $matches);

Does not seem to work.

Thanks!

Sure, a sample input would be: _"Lorem ipsum
I want to be excluded!
dolor sit"_ the output would be _"Lorem ipsum dolor sit"_ — Frank, May 13 '15 at 10:10
@Frank: Whatever you do, the output from regex function will not be continuous - you need to concatenate them together. — nhahtdh, May 13 '15 at 10:21
@tchrist: The other question doesn't quite apply. Actually, it has similar name but deal with different problem altogether. — nhahtdh, May 14 '15 at 04:22
@nhahtdh Ok, I was about to fix it but looks like that's been take care of. — tchrist, May 14 '15 at 04:26

score 1 · Answer 1 · edited May 23 '17 at 11:52

New solution

Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split, plus implode would be the simpler solution:

implode('', preg_split('~<div class="foo">.*?</div>~is', $text))

Demo on ideone

Old solution

I'm not sure whether this is a good idea, but here is my solution:

~(.*?)(?:<div class="foo">.*?</div>|$)~is

Demo on regex101

The result can be picked out from capturing group 1 of each matches.

Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.

The idea is to rely on the fact that lazy quantifier .*? will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*? will not be inside <div class="foo">.*?</div>.

The div tag is matched along in each match in order to advance the cursor past the closing tag. $ is used to match the text after the last matching div.

The s flag makes . matches any character, including line separators.

Revision: I had to change .+? to .*?, since .+? handle strings with 2 matching div next to each other and strings start with matching div.

Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.

Okay thank you very much! This seems to be way more complicated than I was thinking. — Frank, May 13 '15 at 11:08

vks · Answer 2 · 2015-05-13T10:08:28.830

0

<div class=\"foo\">.*?</div>\K|.

You can simply do this by using \K.

\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

edited May 13 '15 at 10:08

answered May 13 '15 at 09:54

vks

67,027
10
91
124

@nhahtdh yups not an ideal one.Will still leave it as of now – vks May 13 '15 at 10:08

PHP - Preg match reversal?

2 Answers2

New solution

Old solution