2

I'm look at some old PERL/CGI code to debug an issue and noticed a lot of uses of:

\d - Match non-digit character
\D - Match digit character

Most online docs mention that \d is the same as [0-9], which is what I've always thought of it as. But, I've also noticed Stackoverflow Questions that mention character set difference.

Does "\d" in regex mean a digit?

Does \d also match a minus sign and/or decimal point?

I'm off to do some testing.

Community
  • 1
  • 1
jjwdesign
  • 3,272
  • 8
  • 41
  • 66
  • 3
    Use `[+-]?\d+(?:\.\d+)?` to match minus plus sign and or decimal point ... – HamZa May 06 '13 at 23:42
  • I was thinking of something more strict, such as /^[0-9]+$/ , which should match from start to end with one or more 0-9. – jjwdesign May 06 '13 at 23:47
  • I though you wanted to match decimal (+-) numbers :p If you want to match only digits then `/^\d+$/` is fine. – HamZa May 06 '13 at 23:52

3 Answers3

11

Does \d also match a minus sign and/or decimal point?

NO

Kent
  • 189,393
  • 32
  • 233
  • 301
8

I don't know how Perl determine whether to use Unicode or ASCII or locale by default (no flag, no use). Regardless, by declaring use re '/a'; (ASCII), or use re '/u'; (Unicode), or use re '/l'; (locale), you will clearly signify to the Perl interpreter (and human reader) which mode you want to use and avoid unexpected behaviour.

Due to the effect of modifiers, \d has at least 2 meanings:

  • Under effect of /a flag (ASCII), \d will match digits from 0 to 9 (no more and no less).
  • Under effect of /u flag (Unicode), \d will match any decimal digit in any language, and is equivalent to \p{Digit}reference. This effectively makes \d+ pretty useless and dangerous to use, since it allows a mix of digits in any languages.

    Quote from description of /u flag

    And, \d+ , may match strings of digits that are a mixture from different writing systems, creating a security issue. num() in Unicode::UCD can be used to sort this out. Or the /a modifier can be used to force \d to match just the ASCII 0 through 9.

\d will not match any sign or punctuation, since those characters does not belong to Nd (Number, decimal digit) General Category of Unicode.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • Do you consider \D (Match non-digit character) to be as "dangerous to use"? – jjwdesign May 07 '13 at 00:12
  • @ikegami: Using flag will set the behaviour. But what is the default behaviour - where will Perl gets its setting from if we don't set anything? – nhahtdh May 07 '13 at 04:13
  • @jjwdesign: If you are using it in a validation regex, then you would like to review it. `\D` will match Unicode character regardless of the flag you are using. – nhahtdh May 07 '13 at 04:18
3

The answer is no. It merely does a digit check. However, Unicode makes things a bit more complex.

If you want to make sure something is a number -- a decimal number -- ake a look at the Scalar::Util module. One of the functions it has is look_like_number. This can be used to see if the string you're looking at could be a number or not, and works better than trying to use a regular expression.

This module has been part of standard Perl for a while, so you should have it on your system.

David W.
  • 105,218
  • 39
  • 216
  • 337