1

The regex in question is:

(edit[\s\S]{0,}?service ("ALL")[\s\S]{0,}?next)

In the following example, my regex is working properly and it finds me all matches correctly from this:

edit 1035
    set schedule "always"
    set service "ALL"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

See: https://regex101.com/r/A5E8Iu/1/

However, if I change the first occurrence of ALL for ALL2:

edit 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

See: https://regex101.com/r/A5E8Iu/2

it becomes greedy and includes the first match instead of only including the second one

Can someone explain me why it does not start at "edit 103" in the following updated example?

Bohemian
  • 412,405
  • 93
  • 575
  • 722
Guillaume Caillé
  • 393
  • 2
  • 6
  • 20

3 Answers3

1

Remember that a regex engine parses strings from left to right.

You have blocks of substrings that are delimited with edit and next. Since the first edit block can be matched first, it is matched, and then [\s\S]*? matches up to the first occurrence of service "ALL" that is in the second block.

You might fix the regex using a tempered greedy token:

edit(?:(?!edit)[\s\S])*?service ("ALL")[\s\S]*?next
    ^^^^^^^^^^^^^^^^^^^^

See this regex demo.

The (?:(?!edit)[\s\S])*? construct matches any char ([\s\S]), 0+ repetitions as few as possible (*?), that does not start the edit char sequence.

However, if edit or next happen to be inside the block, you will have incorrect matches. A safer regex will look like

(?m)^\h*edit \d+(?:(?!^\h*edit)[\s\S])*?service ("ALL")[\s\S]*?\R\h*next$

See the regex demo

Details

  • (?m)^ - start of a line
  • \h* - 0+ horizontal whitespaces
  • edit \d+ - edit, space and 1+ digits
  • (?:(?!^\h*edit)[\s\S])*? - any text not overflowing edit that is at the start of a line optionally preceded with 0+ horizontal whitespaces up to the first...
  • service ("ALL") - service "ALL" substring ("ALL" is captured into Group 1)
  • [\s\S]*? - any 0+ chars, as few as possible
  • \R - a line break
  • \h* - 0+ horizontal whitespaces
  • next - a literal substring
  • $ - end of a line.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

First, notice that your regex can be simplified to:

(edit[\s\S]*?service ("ALL")[\s\S]*?next)

Now, regarding your question - the reason it does that is because when you have the

"ALL2"

in the text, there is now only ONE occurrence of

"ALL"

in the entire text. Your regex pattern searches specifically for "ALL" (where there is no 2 between the L and the second double-quote)

Josh Withee
  • 9,922
  • 3
  • 44
  • 62
0

Note that this

edit 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set service "ALL"
    set utm-status enable
next

also match your regex. It starts with

edit

then you have a bunch of chars (as less as possible) until your next service "ALL"

 1035
    set schedule "always"
    set service "ALL2"
    set utm-status enable
next
edit 103
    set schedule "always"
    set 

Now you have an occurrence of

service "ALL"

and then you have another bunch of chars until next

    set utm-status enable
next

So, your regex should be working fine, the whole text matches the first capturing group (1 time), and the words service "ALL" matches the second.

@Marathon55 pointed that this regex can be simplified with

(edit.*?service ("ALL").*?next)

[\s\S] matches any char like . does,

{0,}? matches any quantity of them (ungreedy) like *? does

but in fact, . matches all chars except for line terminators, so the regex doesn't match anything due to endlines.

alseether
  • 1,889
  • 2
  • 24
  • 39