Need help to understand the star quantifier (*) output

Question

I am able to understand output of the below command:

import re
text = "streets2345"
pattern = r"\d+"
match = re.search(pattern, text)
print(match.group(0))

Output: 2345

However, I am not able to understand why the below code is returning null.

import re
text = "streets2345"
pattern = r"\d*"
match = re.search(pattern, text)
print(match.group(0))

Output: null

Here, the first character s of the text matches the pattern \d*.

So, why the output is not s instead of null?

`Here, the first character s of the text matches the pattern \d*`: No it doesn't match `s`. `\d*` just matches position before `s` which is of **zero width** hence you get nothing shown in output. — anubhava, Sep 21 '22 at 20:22
See the matches here, look at the *first match* https://regex101.com/r/ERNx7D/1 at the MATCH INFORMATION It matches the first position as the digits are optional. — The fourth bird, Sep 21 '22 at 20:23
My understanding is that every character is matched one at a time from left to right. So, how come `\d*` matches position `before s`. Can you please share more details? — meallhour, Sep 21 '22 at 20:27
It matches the position, as in "on the current position, it can match 0 digits" — The fourth bird, Sep 21 '22 at 20:38
can we say that to have a non-digit char at the first position is equivalent to having 0 digit at the first position? If yes, then 's' matches '\d*', right? — meallhour, Sep 21 '22 at 20:53

VOL · Accepted Answer · 2022-09-21T20:32:31.713

0

\d* will match 0 or more digits. 's' is not a digit, but it will match the position before the 's' as there's 0 digits. Thus the first group will be null (empty). In fact, the first 7 groups will be null because of the same reason, the last one being the position before the last 's' in "streets". The 8th group (index 7) will be "2345".

\d+ will match 1 or more digits. As you don't have a digit before the first 's' (again, there's 0 digits), you won't get a match in there in this case.

If \d* didn't match the empty 0-digit positions before each letter, what would be the difference of \d* and \d+?

edited Sep 21 '22 at 20:32

answered Sep 21 '22 at 20:26

VOL

279
2
9

can you please help understand `but it will match the position before the 's' as there's 0 digits.`? – meallhour Sep 21 '22 at 20:36
1

You can think of the space before the first 's' as an empty position. It being empty, when you use `\d*` you ask a question of "are there 0 or more consecutive digits starting from this position?" and you find that the answer to that is "yes! There's 0 digits". So that's why the regex will match there. – VOL Sep 21 '22 at 20:40
1

And as said, if the regex didn't consider that empty position a match for 0 digits, then what would 0 digits look like? – VOL Sep 21 '22 at 20:43
Here 's' is a non-digit char, so, is it not equivalent to having 0 digit? – meallhour Sep 21 '22 at 21:01
1

But if `\d*` matched 's' then with that logic it should basically match everything and be the same as `.*`. When you start looking at the string from left to right and ask "is there a group of 0 or more characters **that are digits** starting from here" you will find that yes, there are no digits so this is a starting position of such group. But up to which point? When you proceed and encounter the 's' you'll conclude that this is not a digit so the group of 0 or more characters **that are digits** ends. That's why you'll get a group of 0 digits that the 's' breaks. – VOL Sep 21 '22 at 21:14
1

With `\d*` you are not matching non-digits (including 's') but rather 0 or more characters that are digits. Matching non-digits would be done with `\D`. – VOL Sep 21 '22 at 21:14

Need help to understand the star quantifier (*) output

1 Answers1