why is my regex greedy

Question

The regex in question is:

(edit[\s\S]{0,}?service ("ALL")[\s\S]{0,}?next)

In the following example, my regex is working properly and it finds me all matches correctly from this:

edit 1035
    set schedule "always"
    set service "ALL"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

See: https://regex101.com/r/A5E8Iu/1/

However, if I change the first occurrence of ALL for ALL2:

edit 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

See: https://regex101.com/r/A5E8Iu/2

it becomes greedy and includes the first match instead of only including the second one

Can someone explain me why it does not start at "edit 103" in the following updated example?

You have blocks of texts, and since the first block `edit` can be matched first, it is matched, and then `[\s\S]*?` matches up to the first occurrence of `service "ALL"` that is in the second block. Regex engine parses strings from left to right. You might fix it [like this, for example](https://regex101.com/r/J3X4ZE/1). — Wiktor Stribiżew, Dec 12 '17 at 21:46
@WiktorStribiżew Your regex is the only one that works for my case. While I don't fully understand it, I learned alot from it. Thank you ! — Guillaume Caillé, Dec 13 '17 at 12:30

score 1 · Accepted Answer · answered Dec 13 '17 at 14:02

Remember that a regex engine parses strings from left to right.

You have blocks of substrings that are delimited with edit and next. Since the first edit block can be matched first, it is matched, and then [\s\S]*? matches up to the first occurrence of service "ALL" that is in the second block.

You might fix the regex using a tempered greedy token:

edit(?:(?!edit)[\s\S])*?service ("ALL")[\s\S]*?next
    ^^^^^^^^^^^^^^^^^^^^

See this regex demo.

The (?:(?!edit)[\s\S])*? construct matches any char ([\s\S]), 0+ repetitions as few as possible (*?), that does not start the edit char sequence.

However, if edit or next happen to be inside the block, you will have incorrect matches. A safer regex will look like

(?m)^\h*edit \d+(?:(?!^\h*edit)[\s\S])*?service ("ALL")[\s\S]*?\R\h*next$

See the regex demo

Details

(?m)^ - start of a line
\h* - 0+ horizontal whitespaces
edit \d+ - edit, space and 1+ digits
(?:(?!^\h*edit)[\s\S])*? - any text not overflowing edit that is at the start of a line optionally preceded with 0+ horizontal whitespaces up to the first...
service ("ALL") - service "ALL" substring ("ALL" is captured into Group 1)
[\s\S]*? - any 0+ chars, as few as possible
\R - a line break
\h* - 0+ horizontal whitespaces
next - a literal substring
$ - end of a line.

Josh Withee · Answer 2 · 2017-12-13T14:12:32.290

0

First, notice that your regex can be simplified to:

(edit[\s\S]*?service ("ALL")[\s\S]*?next)

Now, regarding your question - the reason it does that is because when you have the

"ALL2"

in the text, there is now only ONE occurrence of

"ALL"

in the entire text. Your regex pattern searches specifically for "ALL" (where there is no 2 between the L and the second double-quote)

edited Dec 13 '17 at 14:12

answered Dec 13 '17 at 05:17

Josh Withee

9,922
3
44
62

Hey, thank you for trying to help me. Sadly your regex does not match anything in my case since there is multiple lines. – Guillaume Caillé Dec 13 '17 at 12:27
Ah, that's because I tried simplifying the regex. The "." in my regex wasn't matching newlines. I'll edit the answer. – Josh Withee Dec 13 '17 at 14:11

score 0 · Answer 3 · edited May 11 '18 at 08:18

Note that this

edit 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

also match your regex. It starts with

edit

then you have a bunch of chars (as less as possible) until your next service "ALL"

 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set

Now you have an occurrence of

service "ALL"

and then you have another bunch of chars until next

    set utm-status enable
next

So, your regex should be working fine, the whole text matches the first capturing group (1 time), and the words service "ALL" matches the second.

@Marathon55 pointed that this regex can be simplified with

(edit.*?service ("ALL").*?next)

[\s\S] matches any char like . does,

{0,}? matches any quantity of them (ungreedy) like *? does

but in fact, . matches all chars except for line terminators, so the regex doesn't match anything due to endlines.

why is my regex greedy

3 Answers3