218

I found that in 123, \d matches 1 and 3 but not 2. I was wondering if \d matches a digit satisfying what kind of requirement? I am talking about Python style regex.

Regular expression plugin in Gedit is using Python style regex. I created a text file with its content being

123

Only 1 and 3 are matched by the regex \d; 2 is not.

Generally for a sequence of digit numbers without other characters in between, only the odd order digits are matches, and the even order digits are not. For example in 12345, the matches are 1, 3 and 5.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
Tim
  • 1
  • 141
  • 372
  • 590
  • 6
    `\d` will match `1`, `2` and `3`. If it doesn't there must be something else in your expression. Can you show your full expression? – Alex Aza Jun 25 '11 at 17:37
  • 6
    `\d` is shorthand for `[0-9]`, so it ought to match `2`. Please post a complete test case (a script that can be run, which demonstrates your problem) and maybe we can figure out what's wrong. – zwol Jun 25 '11 at 17:37
  • @delnan: "I found that in 123, \d matches 1 and 3 but not 2" sounds pretty concrete to me. – Amber Jun 25 '11 at 17:41
  • @Amber: Damn me, I missed the not! –  Jun 25 '11 at 17:42
  • \d matches only 1 in 123. Try \d+ to match 123. – Jochen Ritzel Jun 25 '11 at 17:57
  • What happens if you put a space in between the 1 and the 2, and add a 4 immediately after the 3? (I suspect this is either a bug or a deliberate design decision in gedit's search-by-regexp mechanism.) – zwol Jun 25 '11 at 18:06
  • @Zack: for a sequence of digit numbers without other characters in between, only the odd order digits are matches, and the even order digits are not. For example in `12345`, the matches are `1` `3` and `5`. – Tim Jun 25 '11 at 18:10
  • 8
    Okay, I'm not posting this as an answer because I don't *know*, but I think what's going on is gedit refuses to start a new match immediately after the end of the previous match -- it skips one character, whatever it is, before trying to match again. Please try matching `11111` and `22222`. – zwol Jun 25 '11 at 18:34

6 Answers6

587

[0-9] is not always equivalent to \d. In python3, [0-9] matches only 0123456789 characters, while \d matches [0-9] and other digit characters, for example Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩.

wim
  • 338,267
  • 99
  • 616
  • 750
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
  • 5
    Trying this in the REPL: `import re, re.match(r'\d', '٠١٢٣٤٥٦٧٨٩')` shows no match – nickf Jul 18 '12 at 10:48
  • 2
    For Persian and Arabic , in java and javascript engines, use \p{Nd} – Alireza Fattahi Nov 16 '13 at 13:57
  • That's not correct. Not all engines implement it like that. Onigurum and Onigmo (used e.g. in Ruby) have \d == [0-9], if you want other scripts' digits, you have to use \p{digit} – apeiros Apr 12 '14 at 16:53
  • 8
    +1, but mmmm... the OP's tag is `Python` and `\d` matches any Unicode digits only in Python3. In Python 2.7 it's still the old ASCII `[0-9]`—it could be worth clarifying that in the answer. :) – zx81 Jun 16 '14 at 11:02
  • php here : `preg_match()` returns false for `١٢٣٤٥٦٧٨٩`, so `\d` == `[0-9]` – Zanshin13 Aug 11 '15 at 14:45
  • Is there any equivalent to \d if say we only want to match 1 of English, Arabic and others too. –  Dec 28 '17 at 19:20
  • 1
    @FarazAhmad, probably not, you have to specify all characters separately – Kirill Polishchuk Dec 28 '17 at 23:26
  • 1
    If you want `\d` to only match `[0-9]`, you can use the ASCII flag. E.g.: `re.search('\d', 'string_to_search', flags=re.ASCII)`. See: https://docs.python.org/3/library/re.html#re.ASCII – Caumons May 19 '20 at 08:19
  • u used Persian digits, just use English ones 0123456789 – BGOPC Jun 29 '23 at 14:27
21

\d matches any single digit in most regex grammar styles, including python. Regex Reference

Nisarg Shah
  • 14,151
  • 6
  • 34
  • 55
Will
  • 2,858
  • 6
  • 33
  • 50
16

In Python-style regex, \d matches any individual digit. If you're seeing something that doesn't seem to do that, please provide the full regex you're using, as opposed to just describing that one particular symbol.

>>> import re
>>> re.match(r'\d', '3')
<_sre.SRE_Match object at 0x02155B80>
>>> re.match(r'\d', '2')
<_sre.SRE_Match object at 0x02155BB8>
>>> re.match(r'\d', '1')
<_sre.SRE_Match object at 0x02155B80>
Amber
  • 507,862
  • 82
  • 626
  • 550
  • Thanks! My regex parser is regular expression plugin in gedit. The whole content is `123`. – Tim Jun 25 '11 at 17:45
11

\\d{3} matches any sequence of three digits in Java.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
srajan
  • 185
  • 2
  • 8
8

This is just a guess, but I think your editor actually matches every single digit — 1 2 3 — but only odd matches are highlighted, to distinguish it from the case when the whole 123 string is matched.

Most regex consoles highlight contiguous matches with different colors, but due to the plugin settings, terminal limitations or for some other reason, only every other group might be highlighted in your case.

Doghouse87
  • 117
  • 1
  • 5
2

Info regarding .NET / C#:

Decimal digit character: \d \d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.

If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9]. For information on ECMAScript regular expressions, see the "ECMAScript Matching Behavior" section in Regular Expression Options.

Info: https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-classes-in-regular-expressions#decimal-digit-character-d

juFo
  • 17,849
  • 10
  • 105
  • 142