-2

I am able to understand output of the below command:

import re
text = "streets2345"
pattern = r"\d+"
match = re.search(pattern, text)
print(match.group(0))

Output: 2345


However, I am not able to understand why the below code is returning null.

import re
text = "streets2345"
pattern = r"\d*"
match = re.search(pattern, text)
print(match.group(0))

Output: null

Here, the first character s of the text matches the pattern \d*.

So, why the output is not s instead of null?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
meallhour
  • 13,921
  • 21
  • 60
  • 117
  • 2
    `Here, the first character s of the text matches the pattern \d*`: No it doesn't match `s`. `\d*` just matches position before `s` which is of **zero width** hence you get nothing shown in output. – anubhava Sep 21 '22 at 20:22
  • 1
    See the matches here, look at the *first match* https://regex101.com/r/ERNx7D/1 at the MATCH INFORMATION It matches the first position as the digits are optional. – The fourth bird Sep 21 '22 at 20:23
  • My understanding is that every character is matched one at a time from left to right. So, how come `\d*` matches position `before s`. Can you please share more details? – meallhour Sep 21 '22 at 20:27
  • 1
    It matches the position, as in "on the current position, it can match 0 digits" – The fourth bird Sep 21 '22 at 20:38
  • can we say that to have a non-digit char at the first position is equivalent to having 0 digit at the first position? If yes, then 's' matches '\d*', right? – meallhour Sep 21 '22 at 20:53

1 Answers1

0

\d* will match 0 or more digits. 's' is not a digit, but it will match the position before the 's' as there's 0 digits. Thus the first group will be null (empty). In fact, the first 7 groups will be null because of the same reason, the last one being the position before the last 's' in "streets". The 8th group (index 7) will be "2345".

\d+ will match 1 or more digits. As you don't have a digit before the first 's' (again, there's 0 digits), you won't get a match in there in this case.

If \d* didn't match the empty 0-digit positions before each letter, what would be the difference of \d* and \d+?

VOL
  • 279
  • 2
  • 9
  • can you please help understand `but it will match the position before the 's' as there's 0 digits.`? – meallhour Sep 21 '22 at 20:36
  • 1
    You can think of the space before the first 's' as an empty position. It being empty, when you use `\d*` you ask a question of "are there 0 or more consecutive digits starting from this position?" and you find that the answer to that is "yes! There's 0 digits". So that's why the regex will match there. – VOL Sep 21 '22 at 20:40
  • 1
    And as said, if the regex didn't consider that empty position a match for 0 digits, then what would 0 digits look like? – VOL Sep 21 '22 at 20:43
  • Here 's' is a non-digit char, so, is it not equivalent to having 0 digit? – meallhour Sep 21 '22 at 21:01
  • 1
    But if `\d*` matched 's' then with that logic it should basically match everything and be the same as `.*`. When you start looking at the string from left to right and ask "is there a group of 0 or more characters **that are digits** starting from here" you will find that yes, there are no digits so this is a starting position of such group. But up to which point? When you proceed and encounter the 's' you'll conclude that this is not a digit so the group of 0 or more characters **that are digits** ends. That's why you'll get a group of 0 digits that the 's' breaks. – VOL Sep 21 '22 at 21:14
  • 1
    With `\d*` you are not matching non-digits (including 's') but rather 0 or more characters that are digits. Matching non-digits would be done with `\D`. – VOL Sep 21 '22 at 21:14