1

Imaging the following multi-line text:

foo1
  some text
foo12
  some text
  bar
  some text
foo123
  some text

I want to find out to which foo belongs bar. In other words, I need to match only the numbers immediately after the last foo which still has bar after it.

In the example above the last foo meeting the condition is foo12, so I would like to match 12.

I have almost no clue about regex and so far I got something like:

(?s)(?<=foo)\d*(?=.*bar)

You can check it out here:

https://regex101.com/r/FiwO14/1

It is matching the numbers after the first two foo (foo1 and foo12), but I need just the second of them.

David
  • 6,695
  • 3
  • 29
  • 46

1 Answers1

2

You need to use a (?:(?!foo).)*? tempered greedy token in the lookahead:

(?s)(?<=foo)\d*(?=(?:(?!foo).)*?bar)
                  ^^^^^^^^^^^^^^

See the regex demo

The (?:(?!foo).)*? pattern matches any char (.), 0 or more times but as few as possible (*?), that is not a starting char of the foo sequence.

You may also write it as follows if the foos are always at the start of a line:

(?<=foo)\d*(?=.*(?:\R(?!foo\d).*)*bar)

See another regex demo (notice the absence of (?s) DOTALL modifier here, it is not necessary here). The .*(?:\R(?!foo\d).*)* matches:

  • .* - the rest of the line
  • (?:\R(?!foo\d).*)* - zero or more consecutive sequences of:
    • \R(?!foo\d) - any line break sequence (\R) that is not followed with foo and any digit
    • .* - the rest of the line.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you Wiktor. I need to wait some minutes to accept the answer, but both solutions work perfectly. Thanks specially for including the explanation and the terminology. – David Sep 06 '17 at 08:33
  • Note that [unroll-the-loop approach](http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop) (the second one) is much more efficient and should be preferred. A plain tempered greedy token is only good when we are lazy or when using a variable, or backreference. – Wiktor Stribiżew Sep 06 '17 at 08:35
  • Nice, I can see in *regex101* that it actually needs less steps to achieve it. – David Sep 06 '17 at 08:42
  • Yeah, but the number of steps is not actually a direct evidence that one regex is more efficient than another. In this case, sure, the second one will be more efficient because of the unroll-the-loop technique. Remember that a regex performance should only be checked in the target environment. – Wiktor Stribiżew Sep 06 '17 at 08:45