Because the documentation is broken. And it's not the only place where it is so, unfortunately.
PHP uses PCRE under the hood to implement its preg_*
functions. PCRE's documentation is thus authoritative there. PHP's documentation is based on PCRE's, but it looks like you found yet another mistake.
Here's what you can read in PCRE's docs (emphasis mine):
By default, characters with values greater than 128 do not match any of the POSIX character classes. However, if the PCRE_UCP
option is passed to pcre_compile()
, some of the classes are changed so that Unicode character properties are used. This is achieved by replacing certain POSIX classes by other sequences, as follows:
[:alnum:] becomes \p{Xan}
[:alpha:] becomes \p{L}
[:blank:] becomes \h
[:digit:] becomes \p{Nd}
[:lower:] becomes \p{Ll}
[:space:] becomes \p{Xps}
[:upper:] becomes \p{Lu}
[:word:] becomes \p{Xwd}
If you dig further in PHP's docs, you'll find the following:
u (PCRE_UTF8
)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_*
function to match nothing; an invalid pattern will trigger an error of level E_WARNING
. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.
This is, unfortunately, a lie. The u
modifier in PHP means PCRE_UTF8 | PCRE_UCP
(UCP stands for Unicode Character Properties). The PCRE_UCP
flag is the one that changes the meaning of \d
, \w
and the like, as you can see from the docs above. Your tests confirm that.
As a side note, don't infer properties of one regex flavor from another. It doesn't always work (heh, even this chart forgot about the PCRE_UCP
option).