1

I'm validating an internationalized name such as L'étoile with this regex:

/^[\pL',-.\s]+$/

When I capture the input and run it through the regex, there is no match:

 <input type="text" name="firstname" value="">
 $value = trim($_POST['firstname']);
 $pattern = "/^[\pL',-.\s]+$/";
 print $value.'<br />';
 print preg_match_all($pattern, $value, $match); 

 This prints:
 L'étoile
 0

However when I hard code a string like below it matches just fine.

$value = "L'étoile";
$pattern = "/^[\pL',-.\s]+$/";
print $value.'<br />';
print preg_match_all($pattern, $value, $match);     

This prints: 
L'�toile   
1
Will
  • 24,082
  • 14
  • 97
  • 108
Jimski
  • 826
  • 8
  • 23

1 Answers1

2

You're missing the u pattern modifier in your regex:

u (PCRE_UTF8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.

I would also recommend single-quoting your pattern instead of double-quoting, to prevent PHP from interpolating things inside the string.

Instead of:

preg_match_all("/^[\pL',-.\s]+$/", $value, $match);   

Use:

preg_match_all('/^[\pL\',-.\s]+$/u', $value, $match);   
Community
  • 1
  • 1
Will
  • 24,082
  • 14
  • 97
  • 108