PHP Reverse Preg_match

Question

if(preg_match("/" . $filter . "/i", $node)) {
    echo $node;
}

This code filters a variable to decide whether to display it or not. An example entry for $filter would be "office" or "164(.*)976".

I would like to know whether there is a simple way to say: if $filter does not match in $node. In the form of a regular expression?

So... not an "if(!preg_match" but more of a $filter = "!office" or "!164(.*)976" but one that works?

Could you say *why* you don't want to use `!preg_match()`? – Tim Pietzcker Apr 18 '11 at 14:14 — Tim Pietzcker, Apr 18 '11 at 14:14

score 12 · Accepted Answer · answered Apr 18 '11 at 14:09

12

This can be done if you definitely want to use a "negative regex" instead of simply inverting the result of the positive regex:

if(preg_match("/^(?:(?!" . $filter . ").)*$/i", $node)) {
    echo $node;
}

will match a string if it doesn't contain the regex/substring in $filter.

Explanation: (taking office as our example string)

^          # Anchor the match at the start of the string
(?:        # Try to match the following:
 (?!       # (unless it's possible to match
  office   # the text "office" at this point)
 )         # (end of negative lookahead),
 .         # Any character
)*         # zero or more times
$          # until the end of the string

answered Apr 18 '11 at 14:09

Tim Pietzcker

328,213
58
503
561

1

I'm curious, do you have any idea what the performance of this would be as opposed to the `!preg_match()` approach? I'm not in a place where I can test them both. – Justin Morgan - On strike Apr 19 '11 at 21:53
I'd expect this solution to be slower in general than the negation approach because of the added overhead of lookaround assertions. Actual results will depend on whether your input usually matches `$filter` (in which case negation will be faster) or whether it doesn't (in which case this approach may be faster). – Tim Pietzcker Apr 20 '11 at 06:04

score 7 · Answer 2 · answered Apr 18 '11 at 14:14

7

The (?!...) negative assertion is what you're looking for.

To exclude a certain string from appearing anywhere in the subject you can use this double assertion method:

preg_match('/(?=^((?!not_this).)+$)  (......)/xs', $string);

It allows to specify an arbitrary (......) main regex still. But you could just leave that out, if you only want to forbid a string.

answered Apr 18 '11 at 14:14

mario

144,265
20
237
291

Thank you very very much for the negative assertion link, this defenitly solved my problem, the marked answer is also good, but i liked the detailed information within the page a lot. thx so far. – prdatur Sep 04 '12 at 21:12

score 0 · Answer 3 · answered Sep 16 '15 at 22:01

Answer number 2 by mario is the correct answer, and here is why:

First to answer the comment by Justin Morgan,

I'm curious, do you have any idea what the performance of this would be as opposed to the !preg_match() approach? I'm not in a place where I can test them both. – Justin Morgan Apr 19 '11 at 21:53

Consider the gate logic for a moment.

When to negate preg_match(): when looking for a match and you want the condition to be 1)true for the absence of the desired regex, or 2)false for the regex being present.

When to use negative assertion on the regex: when looking for a match and you want the condition to be true if the string ONLY matches the regex, and fail if anything else is found. This is necessary if you really need to test for undesireable characters while allowing ommission of permitted characters.

Negating the result of (preg_match() === 1) only tests if the regex is present. If 'bar' is required, and numbers aren't allowed, the following won't work:

if (preg_match('bar', 'foo2bar') === 1) {
  echo "found 'bar'"; // but a number is here, so fail.
}

if (!pregmatch('[0-9]', 'foobar') === 1) {
  echo "no numbers found"; // but didn't test for 'bar', so fail.
}

So, in order to really test multiple regexes, a beginner would test using multiple preg_match() calls... we know this is a very amateur way to do it.

So, the Op wants to test a string for possible regexes, but the conditional may only pass as true if the string contains at least one of them. For most simple cases, simply negating preg_match() will suffice, but for more complex or extensive regex patterns, it won't. I will use my situation for a more real-life scenario:

Say you want to have a user form for a person's name, particularly a last name. You want your system to accept all letters regardless of case and placement, accept hyphens, accept apostrophes, and exclude all other characters. We know that matching a regex for all undesired characters is the first thing we think of, but imagine you are supporting UTF-8... that's alot of characters! Your program will be nearly as big as the UTF-8 table just on a single line! I don't care what hardware you have, your server application has a finite limit on how long a command be, not to mention the limit of 200 parenthesized subpatterns, so the ENTIRE UTF-8 character table (minus [A-Z],[a-z],-,and ') is too long, never mind that the program itself will be HUGE!

Since we won't use an if (!preg_match('.#\\$\%... this can be quite long and impossible to evaluate... on a string to see if the string is bad, we should instead test the easier way, with an assertion negative lookaround on the regex, then negate the overall result using:

<?php
  $string = "O'Reilly-Finlay";
  if (preg_match('/?![a-z\'-]/i', $string) === 0) {
    echo "the given string matched exclusively for regex pattern";
    // should not work on error, since preg_match returns false, which is not an int (we tested for identity, not equality)
  } else {
    echo "the given string did not match exclusively to the regex pattern";
  }
?>

If we only looked for the regex [a-z\'-]/i , all we say is "match string if it contains ANY of those things", so bad characters aren't tested. If we negated at the function, we say "return false if we find a match that contained any of these things". This isn't right either, so we need to say "return false if we match ANYTHING not in the regex", which is done with lookahead. I know the bells are going off in someone's head, and they are thinking wildcard expansion style... no, lookahead doesn't do this, it just does negation on each match, and continues. So, it checks first character for regex, if it matches, it moves on until it finds a non-match or the end. After it finishes, everything that was found to not match the regex is returned to the match array, or simply returns 1. In short, assert negative on regex 'a' is the opposite of matching regex 'b', where 'b' contains EVERYTHING ELSE not matchable by 'a'. Great for when 'b' would be ungodly extensive.

Note: if my regex has an error in it, I apologize... I have been using Lua for the last few months, so I may be mixing my regex rules. Otherwise, the '?!' is proper lookahead syntax for PHP.

PHP Reverse Preg_match

3 Answers3

Linked