12

I have an issue with lazy quantifiers. Or most likely I misunderstand how I am supposed to use them.

Testing on Regex101 My test string is let's say: 123456789D123456789

.{1,5} matches 12345

.{1,5}? matches 1

I am OK with both matches.

.{1,5}?D matches 56789D !! I would expect it to match 9D

Thanks for clarifying this.

Thomas Ayoub
  • 29,063
  • 15
  • 95
  • 142
A.D.
  • 1,062
  • 1
  • 13
  • 37
  • 2
    I didn't even know the `{x, y}` quantifier could be made lazy, thanks for asking this question ! – Aaron Mar 11 '16 at 16:13

3 Answers3

18

First and foremost, please do not think of greediness and laziness in regex as means of getting the longest/shortest match. "Greedy" and "lazy" terms only pertain to the rightmost character a pattern can match, it does not have any impact on the leftmost one. When you use a lazy quantifier, it will guarantee that the end of your matched substring will be the first found one, not the last found one (that would be returned with a greedy quantifier).

The regex engine analyzes a string from left to right. So, it searches for the first character that meets the pattern and then, once it finds the matching substring, it is returned as a match.

Let's see how it parses the string with .{1,5}D: 1 is found and D is tested for. No D after 1 is found, the regex engine expands the lazy quantifier and matches 12 and tries to match D. There is 3 after 2, again, the engine expands the lazy dot and does it 5 times. After expanding to the max value, it sees there is 12345 and the next character is not D. Since the engine reached the max limiting quantifier value, the match is failed, next location is tested.

The same scenario happens with the locations up to 5. When the engine reaches 5, it tries to match 5D, fails, tries 56D, fails, 567D, fails, 5678D - fails again, and when it tries to match 56789D - Bingo! - the match is found.

This makes it clear that a lazily quantified subpattern at the beginning of a pattern will act "greedily" by default, that is, it will not match the shortest substring.

Here is a visualization from regex101.com:

enter image description here

Now, here is a fun fact: .{1,5}? at the end of the pattern will always match 1 character (if there is any) because the requirement is to match at least 1, and it is sufficient to return a valid match. So, if you write D.{1,5}?, you will get D1 and D6 in 123456789D12345D678904.

Fun Fact 2: In .NET, you can "ask" the regex engine to analyze the string from right to left with the help of RightToLeft modifier. Then, with .{1,5}?D, you will get 9D, see this demo.

Fun fact 3: In .NET, (?<=(.{1,5}?))D will capture 9 into Group 1 if 123456789D is passed as input. This happens because of the way the lookbehind is implemented in .NET regex (.NET reverses the string as well as the pattern inside the lookbehind, then attempts to match that single pattern on the reversed string). And in Java, (?<=(.{1,5}))D (the greedy version) will capture 9 because it tries all the possible fixed-width patterns in the range, from the shortest to the longest, until one succeeds.

And a solution is: if you know you need 1 character followed with D, just use

/.D/
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why 9D is not matched given the range `{1,5}` in `.{1,5}?D`. Why is it taking 5 (maximum) numbers, why not 1 (minimum) ? –  Mar 11 '16 at 16:02
  • 1
    @noob: Because "lazy" does equal "the shortest possible match". The engine works from left to right, so any match will have the left-most starting character. However, in some flavors, there is a way to get `9D`, like in .NET (see Fun Fact 2). – Wiktor Stribiżew Mar 11 '16 at 16:08
  • 1
    I mean to say `.{1,5}` is greedy, it matches `12345`. `.{1,5}?` is lazy, it matches `1`. Then shouldn't `.{1,5}?D` be lazy too and match `9D` ? But it becomes greedy and matches `56789D`. –  Mar 11 '16 at 16:11
  • 1
    Again, you are confusing notions. Greedy and lazy only pertains to the rightmost character a pattern can match, it does not have anything to do with the leftmost one. I should add this to the beginning of my answer. – Wiktor Stribiżew Mar 11 '16 at 16:12
  • 1
    @noob I suggest you to play a bit with this [Debuggex link](https://www.debuggex.com/r/s6u4IXIm9ha2i87N) (especially the cursors) to have a graphic explanation of this functionality – Thomas Ayoub Mar 11 '16 at 16:19
  • @Thomas: I did grasped a bit from Wiktor's answer. And sure I will check it out. –  Mar 11 '16 at 16:21
  • @Wiktor Stribiżew Thx for explanations, so which regexp will match 9D? Because this is the result I need to get :) – A.D. Mar 11 '16 at 16:41
  • I wrote it at the bottom: `.D`. Here is the [**demo**](https://regex101.com/r/aP2uD9/1). If you are using PCRE regex, it is the only way. – Wiktor Stribiżew Mar 11 '16 at 16:42
  • lol yep of course :). Obvious answer but great explanations thx again Wiktor. – A.D. Mar 11 '16 at 16:50
  • @WiktorStribiżew Thanks for the great explanation :) – Fawaz Ahmed Oct 27 '20 at 13:01
2

Your regex is

.{1,5}?D

matches

123456789D123456789
    ------

But you said you expected 9D because using of "non-greedy quantifier".

Anyway, how about this?

D.{1,5}?

What is a result of matching?

Yes! as you expected it matches

123456789D123456789
         --

So, WHY?

OK, The first, I think you need to understand that normally regex engine will read characters from left to right-hand side of an input string. Considering your example which using non-greedy quantifier, once engine is matched

123456789D123456789
    ------

It will not go further to

123456789D123456789
     -----
123456789D123456789
      ----
...
123456789D123456789
        --

Because regex engine will evaluate text as less as possible, this is why it also called "Lazy quantifiers".

And it also work in the same way on my regex D.{1,5}? which should not go further to

123456789D123456789
         ---
123456789D123456789
         ----
...
123456789D123456789
         ------

But stop at the first match

123456789D123456789
         --
fronthem
  • 4,011
  • 8
  • 34
  • 55
0

If you have a string containing numbers followed by a non-numeric, the minimum set of {1,5}? would always be 1 (so it's not necessary to have the range.) I don't think the lazy operator is actually working as we think on the numeric range.

If you make the first \d+ greedy, as below you'll get the minimum number of digits before the D.

(\d+)(\d{1,5}D) Matches 9D in the second group

If you make the first set of numbers lazy, then you'll get the maximum number of digits (5)

(\d+?)(\d{1,5}D) Matches 56789D in the second group

I think these regex expressions might be more in line with what you need.

NickT
  • 214
  • 1
  • 5