Confusion about preg_match return value in PHP if statement

Question

I'm using this code, which is taken nearly verbatim from Programming PHP by Rasmus Lerdorf (page 291), to make sure that the value in a POSTed form field, 'story_title', contains only alphabetic letters, spaces, single quotes or hyphens:

if (preg_match('/[^A-Za-z \'\-]/', $_POST['story_title'])) {
    die('Illegal characters in story_title.');
} else { 
    $story_title = $_POST['story_title'];
}

I'm not very experienced with PHP, but I read this as:

Does $_POST['story_title'] contain letters, hyphens, single quotes and/or spaces?
    Yes? Error!
    No? Excellent. Proceed.

What? The code "works" in that it does filter out unwanted characters (<, >, (, ), etc.), but I don't understand how or why it's working. It seems like the if statement with the preg_match should flow the opposite way, with the preg_match returning true if only legal characters are present and false if not.

The documentation says that "preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred" and also warns that "This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE," which only makes things murkier for me. Shouldn't the function be returning 1 when $_POST['story_title'] contains only legal characters?

Could somebody kindly explain what exactly is happening in this code? I'm glad it seems to work, but I have a hard time using code I don't understand.

`[^A-Za-z \'\-]` means if input doesn't contain any of the characters listed here. Note `^` at the start for negation. — anubhava, Jan 30 '14 at 15:44
All this time I think I'm not understanding `preg_match` and I've just totally blanked on the regex itself. Thank you very much; problem solved. — keekeejeejee, Jan 30 '14 at 15:49
[Wrangling function preg_match()](https://stackoverflow.com/questions/6254239/preg-match-if-not/6254296#6254296), especially the unexpected kind of return values. It is infuriating. The official documentation is of very little help in this regard (too cryptic). — Peter Mortensen, Feb 23 '22 at 22:37

score 1 · Accepted Answer · answered Jan 30 '14 at 15:50

1

In regular expressions, [a-z] means "any character in the group a to z", whereas [^a-z] means "any character in the group that contains everything but a to z".

answered Jan 30 '14 at 15:50

Right you are. I don't know how I didn't mentally process the `^`. Thank you. – keekeejeejee Jan 30 '14 at 15:52

Confusion about preg_match return value in PHP if statement

1 Answers1