While matching a regular expression, the interpreter first attempts a match at index 0 in the string.
- If there's no match, it advances to the next index and tries again.
- If there's a match, it returns it and then it tries to match again at the end of the match. If the last match matched the empty string, it advances to the next character.
And so on, for each match (when matched), or each character (when there's no match).
The issue in the regex d*
is that it accepts an empty match - it means the empty string matches the pattern. This implies you'll always get a match.
Let's try the d*
pattern on the dddxdddd
string:
Here's the initial position:
dddxdddd matches: []
^
The ^
really means the cursor is before the first d
. You should think of the cursor as being between two characters in the string. This will help you understand the matching process.
So let's just insert fictional spaces to illustrate that:
d d d x d d d d matches: []
^
We get a first match here, as the first character is a d
:
dddxdddd
\_/
After the match, we place the cursor where the match ended, between the d
and the x
:
d d d x d d d d matches: ["ddd"]
^
And we try to match again. The match succeeds with the empty string between the d
and the x
. As we get an empty match, we advance the cursor:
d d d x d d d d matches: ["ddd", ""]
^
We then try to match again, and we get the dddd
substring:
dddxdddd
\__/
We place the cursor after it:
d d d x d d d d matches: ["ddd", "", "dddd"]
^
So it's now between the last d
and the end of the string. Again, we attempt a match, and we succeed with an empty string:
d d d x d d d d matches: ["ddd", "", "dddd", ""]
^
If we try to advance the cursor, it will be now past the end of the string, which means we've found all matches and we're done.
End result:
["ddd", "", "dddd", ""]