regex - documentation on empty strings before and after characters

Question

In rererence to this question : https://softwareengineering.stackexchange.com/questions/291273/why-regex-when-using-global-search-and-0-quantifier-match-the-end-of-the-st and Regular expression to match a line that doesn't contain a word?

The explanation of empty strings before and after each letter is unknown to me. First time I hear of this. Where else can I read up on it because I'm a little confused now and can't find any other source on this.

It's not that there are "empty strings before and after each letter". The answer to your query is in the answers to the question you linked... _"Your regex matches the '' empty string with `d*`, because the `*` quantifier means zero or many times and that's zero `d` here."_ — msanford, Oct 14 '15 at 19:42

Lucas Trzesniewski · Accepted Answer · 2015-10-14T22:26:05.573

While matching a regular expression, the interpreter first attempts a match at index 0 in the string.

If there's no match, it advances to the next index and tries again.
If there's a match, it returns it and then it tries to match again at the end of the match. If the last match matched the empty string, it advances to the next character.

And so on, for each match (when matched), or each character (when there's no match).

The issue in the regex d* is that it accepts an empty match - it means the empty string matches the pattern. This implies you'll always get a match.

Let's try the d* pattern on the dddxdddd string:

Here's the initial position:

dddxdddd     matches: []
^

The ^ really means the cursor is before the first d. You should think of the cursor as being between two characters in the string. This will help you understand the matching process.

So let's just insert fictional spaces to illustrate that:

 d d d x d d d d     matches: []
^

We get a first match here, as the first character is a d:

dddxdddd
\_/

After the match, we place the cursor where the match ended, between the d and the x:

 d d d x d d d d     matches: ["ddd"]
      ^

And we try to match again. The match succeeds with the empty string between the d and the x. As we get an empty match, we advance the cursor:

 d d d x d d d d     matches: ["ddd", ""]
        ^

We then try to match again, and we get the dddd substring:

dddxdddd
    \__/

We place the cursor after it:

 d d d x d d d d     matches: ["ddd", "", "dddd"]
                ^

So it's now between the last d and the end of the string. Again, we attempt a match, and we succeed with an empty string:

 d d d x d d d d     matches: ["ddd", "", "dddd", ""]
                  ^

If we try to advance the cursor, it will be now past the end of the string, which means we've found all matches and we're done.

End result:

["ddd", "", "dddd", ""]

it finally makes sense. thank you for the example, and the bulleted points..:-) — airnet, Oct 14 '15 at 21:18

regex - documentation on empty strings before and after characters

1 Answers1

Linked