0

I have a String with a number in it:

dfdf00023546546

I want to get only the number:

(0*)(\d+) works
(0*)(\d*) doesn't work
(0*)(\d*$) works 

if plus means 1 or more and asterisk means 0 or more, isn't * suppose to catch more than +? why does adding the $ sign makes it work?

Thanks

Nadav
  • 133
  • 1
  • 12

3 Answers3

1

Your problem is with g mode which is probably not set. If you set this global mode you will see expected substring is matched.

This (0*)(\d*) matches but returns more than two groups in a g mode because both patterns are *-quantified which includes zero-length matches.

+ quantifier denotes at least one occurrence of preceding token so it looks for something which its existence is a must. Having that said, it doesn't return zero-length matches.

Your third try (0*)(\d*$) works the same as + quantifier for the reason that zero-length matches couldn't occur earlier than meeting digits that meet the end of input string. With this regex however, there is a zero-length match at the end when g mode is on.

Community
  • 1
  • 1
revo
  • 47,783
  • 14
  • 74
  • 117
0

This might be hard to understand, but your regex will be somewhat as follows:

  • (0*)(\d+) will return a single match 00023546546.
  • (0*)(\d*$) will return 2 matches 00023546546 and end of string {empty}. The second match is because it has to check for zero or more ocurrences of 0 - which can be {empty} and zero or more occurrences of numbers between 0-9 - which again can be {empty} and the end of string check.
  • (0*)(\d*) on the other hand checks at 6 different positions - before each of the letters, because technically a match can be an {empty} according to your regex. One non-empty match which will return your numbers and one end of string match which is again empty.
hungersoft
  • 531
  • 4
  • 8
0

Please remember that regex will not only match characters, but also produce 0-length matches.

(0*)(\d*) in fact works, it's just that it matches the stuff you want plus some empty matches:

[ '', '', '', '', '00023546546', '' ]

See those 0-length matches?

Now I'll explain why those 0-length matches are there. Your regex says that there should be 0 or more 0s, followed by 0 or more digits. This means that it can match 0 0s and 0 digits, doesn't it? So the space between every character is matched because that "substring" has exactly 0 0s and 0 digits!

By the way (0*)(\d*$) will only work if the match is at the end of the string.

Sweeper
  • 213,210
  • 22
  • 193
  • 313