0

I'm using this code, which is taken nearly verbatim from Programming PHP by Rasmus Lerdorf (page 291), to make sure that the value in a POSTed form field, 'story_title', contains only alphabetic letters, spaces, single quotes or hyphens:

if (preg_match('/[^A-Za-z \'\-]/', $_POST['story_title'])) {
    die('Illegal characters in story_title.');
} else { 
    $story_title = $_POST['story_title'];
}

I'm not very experienced with PHP, but I read this as:

Does $_POST['story_title'] contain letters, hyphens, single quotes and/or spaces?
    Yes? Error!
    No? Excellent. Proceed.

What? The code "works" in that it does filter out unwanted characters (<, >, (, ), etc.), but I don't understand how or why it's working. It seems like the if statement with the preg_match should flow the opposite way, with the preg_match returning true if only legal characters are present and false if not.

The documentation says that "preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred" and also warns that "This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE," which only makes things murkier for me. Shouldn't the function be returning 1 when $_POST['story_title'] contains only legal characters?

Could somebody kindly explain what exactly is happening in this code? I'm glad it seems to work, but I have a hard time using code I don't understand.

keekeejeejee
  • 51
  • 1
  • 10
  • 1
    `[^A-Za-z \'\-]` means if input doesn't contain any of the characters listed here. Note `^` at the start for negation. – anubhava Jan 30 '14 at 15:44
  • All this time I think I'm not understanding `preg_match` and I've just totally blanked on the regex itself. Thank you very much; problem solved. – keekeejeejee Jan 30 '14 at 15:49
  • [Wrangling function preg_match()](https://stackoverflow.com/questions/6254239/preg-match-if-not/6254296#6254296), especially the unexpected kind of return values. It is infuriating. The official documentation is of very little help in this regard (too cryptic). – Peter Mortensen Feb 23 '22 at 22:37

1 Answers1

1

In regular expressions, [a-z] means "any character in the group a to z", whereas [^a-z] means "any character in the group that contains everything but a to z".