Does regular expression \d match minus sign and/or decimal point?

Question

I'm look at some old PERL/CGI code to debug an issue and noticed a lot of uses of:

\d - Match non-digit character
\D - Match digit character

Most online docs mention that \d is the same as [0-9], which is what I've always thought of it as. But, I've also noticed Stackoverflow Questions that mention character set difference.

Does "\d" in regex mean a digit?

Does \d also match a minus sign and/or decimal point?

I'm off to do some testing.

Use `[+-]?\d+(?:\.\d+)?` to match minus plus sign and or decimal point ... — HamZa, May 06 '13 at 23:42
I was thinking of something more strict, such as /^[0-9]+$/ , which should match from start to end with one or more 0-9. — jjwdesign, May 06 '13 at 23:47
I though you wanted to match decimal (+-) numbers :p If you want to match only digits then `/^\d+$/` is fine. — HamZa, May 06 '13 at 23:52

score 11 · Answer 1 · answered May 06 '13 at 23:39

11

Does \d also match a minus sign and/or decimal point?

NO

answered May 06 '13 at 23:39

Kent

189,393
32
233
301

nhahtdh · Accepted Answer · 2013-05-07T04:27:13.760

I don't know how Perl determine whether to use Unicode or ASCII or locale by default (no flag, no use). Regardless, by declaring use re '/a'; (ASCII), or use re '/u'; (Unicode), or use re '/l'; (locale), you will clearly signify to the Perl interpreter (and human reader) which mode you want to use and avoid unexpected behaviour.

Due to the effect of modifiers, \d has at least 2 meanings:

Under effect of /a flag (ASCII), \d will match digits from 0 to 9 (no more and no less).
Under effect of /u flag (Unicode), \d will match any decimal digit in any language, and is equivalent to \p{Digit}^reference. This effectively makes \d+ pretty useless and dangerous to use, since it allows a mix of digits in any languages.

Quote from description of /u flag

And, \d+ , may match strings of digits that are a mixture from different writing systems, creating a security issue. num() in Unicode::UCD can be used to sort this out. Or the /a modifier can be used to force \d to match just the ASCII 0 through 9.

\d will not match any sign or punctuation, since those characters does not belong to Nd (Number, decimal digit) General Category of Unicode.

Do you consider \D (Match non-digit character) to be as "dangerous to use"? — jjwdesign, May 07 '13 at 00:12
@ikegami: Using flag will set the behaviour. But what is the default behaviour - where will Perl gets its setting from if we don't set anything? — nhahtdh, May 07 '13 at 04:13
@jjwdesign: If you are using it in a validation regex, then you would like to review it. `\D` will match Unicode character regardless of the flag you are using. — nhahtdh, May 07 '13 at 04:18

score 3 · Answer 3 · answered May 07 '13 at 04:17

The answer is no. It merely does a digit check. However, Unicode makes things a bit more complex.

If you want to make sure something is a number -- a decimal number -- ake a look at the Scalar::Util module. One of the functions it has is look_like_number. This can be used to see if the string you're looking at could be a number or not, and works better than trying to use a regular expression.

This module has been part of standard Perl for a while, so you should have it on your system.

Does regular expression \d match minus sign and/or decimal point?

3 Answers3

NO