2

I'm trying to match blocks of text that contain certain text within them. Each block is clearly defined by standard start/end text patterns.

In the below example I want to match steps 1 and 3 from the "step start" to "step end" as they contain the text "database:dev". However my current regex matches step 1 fine, but then matches steps 2 and 3 in a single match. It's probably easier to see with an example here: https://regex101.com/r/56tfOQ/3/

I need to specify that each match can only contain one "step start", but I can't work out how to do that.

The regex I'm currently using is:

(?msi)step start.*?database:dev.*?step end

An example of the text is:

step start
    name:step1
    database:dev1
step end
step start
    name:step2
    database:test1
step end
step start
    name:step3
    database:dev2
step end
step start
    name:step4
    database:test2
step end
mjharper
  • 145
  • 8

1 Answers1

2

In a common scenario, you may use a tempered greedy token like (?:(?!<STOP_PATTERN>).)*? in between the starting delimiter and some third string that should appear in between delimiters.

You might write your regex as

(?si)step start(?:(?!step start).)*?database:dev.*?step end

However, it seems your opening delimiter is at the start of a line. Then it makes sense to use

(?msi)^step start(?:(?!^step start).)*?database:dev.*?step end

See the regex demo

Regex graph:

enter image description here

Details

  • (?msi) - multiline, dotall and case insensitive modes are on
  • ^ - line start (since m option is on)
  • step start - starting delimiter
  • (?:(?!^step start).)*? - a tempered greedy token that matches any char, 0+ occurrences/repetitions, as few as possible, that does not start a step start char sequence at the start of a line
  • database:dev - a literal substring
  • .*? - any 0+ chars, as few as possible
  • step end - ending delimiter.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • perfect! thanks. I tried (?msi)step start(?:(?!step start)).*?database:dev.*?step end , at one point. Now I need to work out why the dot being inside the capturing group vs outside makes the difference! Thanks again – mjharper May 22 '19 at 19:45
  • 1
    @mjharper There is no capturing group here. The dot in `(?:(?!...).)*?` is part of a [tempered greedy token](https://stackoverflow.com/a/37343088/3832970), it is described well in my other answer. I have added more details. – Wiktor Stribiżew May 22 '19 at 19:47
  • 1
    sorry - I meant non-capturing group! Will read through your other answer. Thanks again. – mjharper May 23 '19 at 08:32