2

I have a need to detect whether a regular expression is valid, so that invalid ones can be gracefully rejected in a user interface. On Stack Overflow there is a clever abomination to do this with another regular expression, which I plan to strenuously avoid.

There is a much simpler approach of running a match and checking for errors, which returns the correct boolean result, but it would be interesting to get the failure reason/message as well:

// The error is that preg delimiters are missing
$testRegex = 'Location: (.+)';

// This bit is fine
$result = preg_match($testRegex, ''); // returns false i.e. failure
$valid = is_int($result); // false, i.e. the regex is invalid

// Returns PREG_NO_ERROR, which means no error occured
echo preg_last_error() . "\n";

If I run this I correctly get:

PHP Warning: preg_match(): Delimiter must not be alphanumeric or backslash in ... on line ...

However, the output of the error function is 0, which is equal to PREG_NO_ERROR. I would have thought this would return a non-zero error code -- and it would be even better if I can get my hands on a clean version of the warning message.

It is of course possible that this is not generally available (i.e. is just available to the PHP engine for the purposes of printing the warning). I am running 5.5.3-1ubuntu2.6 (cli).

Community
  • 1
  • 1
halfer
  • 19,824
  • 17
  • 99
  • 186
  • Create your own error handler, would that be an option ? – Rizier123 May 02 '15 at 18:00
  • 2
    I suppose `preg_last_error` doesn't reurn anything because in this particular case the regex pattern didn't even make it to the PCRE engine. The pattern delimiters are mandated by PHP, and PCRE itself doesn't expect them. PHP strips the delimiters from the pattern and passes *that* to PCRE. Try to use an invalid pattern, but with delimiters, and see if `preg_last_error` returns something meaningful. – Lucas Trzesniewski May 02 '15 at 18:05
  • @Lucas, thanks - that sounds like an accurate summary of what is happening. – halfer May 02 '15 at 18:51
  • I would ask the user to enter the `regex` only (no delimiters, separate input fields for flags) then combine them into the code and produce the final string to feed to `preg_match()` and friends. Something like: `'/'.preg_quote($_GET['regex']).'/'.$modifiers`. Check http://regex101.com for inspiration. – axiac May 02 '15 at 19:12
  • @axiac Don't forget to specify the delimiter in the `preg_quote()` call. I also forget them all the time: `preg_quote($_GET["regex"], "/")` – Rizier123 May 02 '15 at 19:17
  • ^ @axiac, thanks - that's worth consideration. The only dilemma is if the regex is for, say, a URL, then the fixed slash will mean all the literal slashes will need escaping. If this can be switched on a case-by-case basis, e.g. to `~~`, then this issue is avoided. – halfer May 02 '15 at 19:18
  • @halfer So is your question answered or are you still looking for something different? – Rizier123 May 02 '15 at 19:26
  • @Rizier123 you're right. Thanks for mentioning it. – axiac May 02 '15 at 19:42
  • @Rizier123, all answered, thanks! Your answer and Ivan's have been most helpful. – halfer May 02 '15 at 19:54

2 Answers2

2

This should work for you:

Here I just turn on output buffering with ob_start(). Then I capture the last error with error_get_last(). Then I end the output buffering with ob_end_clean(). After this you can check if the array is empty and if not an error occurred.

ob_start();
$result = preg_match(".*", "Location: xy");
$error = error_get_last();
ob_end_clean();

if(!empty($error))
    echo "<b>Error:</b> " . $error["message"];
else
    echo "no error found!";

output:

Error: preg_match(): No ending delimiter '.' found

EDIT:

If you want you can create your own error handler, which basically just throws an Exception for each error which you would normally get.

Then you can put your code into a try - catch block and catch the exception.

set_error_handler(function($errno, $errstr, $errfile, $errline, array $errcontext) {
    // error was suppressed with the @-operator
    if (0 === error_reporting()) {
        return false;
    }
    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});

try {
    $result = preg_match(".*", "Location: xy");
} catch(Exception $e) {
    echo "<b>Error:</b> " . $e->getMessage();
}

The code for the error handler is from Philippe Gerber in this answer

Community
  • 1
  • 1
Rizier123
  • 58,877
  • 16
  • 101
  • 156
  • Very creative solution, thanks! Turns out there is a PHP function to do this directly - see Ivan's answer. Don't know how I wasn't aware of it... `:-)` – halfer May 02 '15 at 18:53
  • @halfer Would an error handler also be okay, because then you can do it much easier? – Rizier123 May 02 '15 at 18:54
  • @halfer *Turns out there is a PHP function to do this directly* If you mean `error_get_last()`? I use the same in my code ^. – Rizier123 May 02 '15 at 18:56
  • Oops, I didn't read your answer carefully enough! Apologies, I have not consumed enough coffee today. In that case, you don't need the output buffering - just use `@` on the preg function. – halfer May 02 '15 at 18:57
  • @halfer But I wouldn't recommend you to use it! ^ As asked above would you be okay with your own error handler, then you can do it much cleaner than my current answer ?! – Rizier123 May 02 '15 at 18:58
  • I don't know what you mean by my own error handler, but I expect that would be fine. What is the problem with error suppression here? I understand very well the reason why it should not be used in general, but it is being explicitly handled in this case. – halfer May 02 '15 at 18:59
  • @halfer See my updated answer, might be a bit cleaner if you want to use your own error handler – Rizier123 May 02 '15 at 19:05
  • 1
    It's usually bad practice, when it's used for things that can be handled with regular code, like undefined index on array access. However many PHP functions will automatically emit an error like this, which should really be an exception. In that case, it's okay to use a silencer (@) and pick up the `error_get_last()` information. – Ivan Batić May 02 '15 at 19:06
1

Maybe you could use error_get_last() to get a little bit more information.

Array
(
    [type] => 2
    [message] => preg_match(): Delimiter must not be alphanumeric or backslash
    [file] => /Users/ivan/Desktop/test.php
    [line] => 6
)

The type is E_WARNING and you can safely assume the function name string from the message part, since it will always be in the same format.

You can then do

$lastError = error_get_last()['message']; // php 5.5 expression
if(strpos($lastError, 'preg_match(): ') === 0){
    $error = substr($lastError, 14);
}

And $error will be var_dump'ed to

string(47) "Delimiter must not be alphanumeric or backslash"

Or a null

Also, in response to another answer, you can surpress warnings by using @preg_match(...) so you don't have to handle output buffers yourself. error_get_last() will still catch the error.

Ivan Batić
  • 476
  • 2
  • 8