I've refactored your code to make it easier to visualize what's going on.
my $regex = qr/
M? # 1: Match M, greedy, 0 or 1 times.
[VI]? # 2: Match V or I, greedy, 0 or 1 times.
A? # 3: Match A, greedy, 0 or 1 times.
R? # 4: Match R, greedy, 0 or 1 times.
G? # 5: Match G, greedy, 0 or 1 times.
D? # 6: Match D, greedy, 0 or 1 times.
[LM]? # 7: Match L or M, greedy, 0 or 1 times.
G? # 8: Match G, greedy, 0 or 1 times.
[IVMAL]? # 9: Match I, B, M, A or L, greedy, 0 or 1 times.
E? # 10: Match E, greedy, 0 or 1 times.
/x;
my $text = "VMVARGDLGVE";
if ( my @matched = $text =~ /$regex/gx ) {
print "No of matches: ", scalar(@matched), "\n";
print "<<$_>>\n" foreach @matched;
}
Your regular expression is matching in exactly as many ways as it possibly can without breaking the rules of its NFA regexp engine.
One of those rules is "leftmost"; the substring closest to the left
in the target that allows the total match to succeed will be chosen.
Another rule is that greedy quantifiers will match as much as
possible, and will only give up the submatch they own if it becomes
necessary to allow the full match to succeed. Giving up part of
their match involves backtracking. Backtracking is avoided unless a
quantifier has taken too much, and must relinquish it to allow the
full match to succeed.
And another rule is that iterative matching resumes at the point
that the previous match left off.
Walking through your target string, "V" matches the subpattern [VI]?
(hereafter referred to as subpattern #2). The greedy quantifier holds onto that 'V', and will only release it if it is forced to later on, for the greater good.
"M" from the target string matches subpattern #7. And "V" matches subpattern #9. First iteration is done, having matched "VMV".
Now the remainder of your target string looks like "ARGDLGVE". The pos
marker is at 3 (the 4th character in the target string), so matching on the second iteration begins there. 'A' matches at subpattern #3, 'R' matches at #4, 'G' matches at #5, 'D' matches at #6, 'L' matches at #7, 'G' matches at #8, 'V' matches at #9, and 'E' matches at #10. The second iteration is done, having matched 'ARGDLGVE' from the target string.
On the third iteration, the pos
marker is at 11, which is after the last character in the target string. So the empty string is compared against your regular expression. Because every quantifier in your regexp is "0 or 1", it is acceptable for the regular expression to match an empty string. So the third iteration is done, having matched "" (empty string).
You got three matches: "VMV", "ARGDLGVE", and "".
One thing you may wish to do is to take control of the pos
marker. Put your regular expression in a while loop, and before the loop terminates, advance pos
one more position from the start of the string. But that only solves the problem you are having with the third rule mentioned above. You still will have the problem that quantifiers do very specific things, and don't violate their own rules just because you think it would be convenient.
The point is that the regular expression engine is not a permutation engine. Its job is to determine if a given target string matches a given pattern, following a set of well-defined (though sometimes confusing) rules.
I'm not sure what the bigger picture problem is that you're trying to solve. If you're simply trying to expand a set of ranges, you might have better success with the CPAN module String::Range::Expand. There are probably other CPAN modules that could do range expansion for you too, but this one could be a good starting point.