I don't know how Perl determine whether to use Unicode or ASCII or locale by default (no flag, no use
). Regardless, by declaring use re '/a';
(ASCII), or use re '/u';
(Unicode), or use re '/l';
(locale), you will clearly signify to the Perl interpreter (and human reader) which mode you want to use and avoid unexpected behaviour.
Due to the effect of modifiers, \d
has at least 2 meanings:
- Under effect of
/a
flag (ASCII), \d
will match digits from 0
to 9
(no more and no less).
Under effect of /u
flag (Unicode), \d
will match any decimal digit in any language, and is equivalent to \p{Digit}
reference. This effectively makes \d+
pretty useless and dangerous to use, since it allows a mix of digits in any languages.
Quote from description of /u
flag
And, \d+
, may match strings of digits that are a mixture from different writing systems, creating a security issue. num() in Unicode::UCD
can be used to sort this out. Or the /a
modifier can be used to force \d
to match just the ASCII 0 through 9.
\d
will not match any sign or punctuation, since those characters does not belong to Nd
(Number, decimal digit) General Category of Unicode.