Find numbers after last occurrence of keyword1 but still before keyword2

Question

Imaging the following multi-line text:

foo1
  some text
foo12
  some text
  bar
  some text
foo123
  some text

I want to find out to which foo belongs bar. In other words, I need to match only the numbers immediately after the last foo which still has bar after it.

In the example above the last foo meeting the condition is foo12, so I would like to match 12.

I have almost no clue about regex and so far I got something like:

(?s)(?<=foo)\d*(?=.*bar)

You can check it out here:

https://regex101.com/r/FiwO14/1

It is matching the numbers after the first two foo (foo1 and foo12), but I need just the second of them.

@CinCout. No special reason to be honest. After spending some time with it I was really curious about the "regex way" to do it :) — David, Sep 06 '17 at 08:42

score 2 · Accepted Answer · answered Sep 06 '17 at 08:29

2

You need to use a (?:(?!foo).)*? tempered greedy token in the lookahead:

(?s)(?<=foo)\d*(?=(?:(?!foo).)*?bar)
                  ^^^^^^^^^^^^^^

See the regex demo

The (?:(?!foo).)*? pattern matches any char (.), 0 or more times but as few as possible (*?), that is not a starting char of the foo sequence.

You may also write it as follows if the foos are always at the start of a line:

(?<=foo)\d*(?=.*(?:\R(?!foo\d).*)*bar)

See another regex demo (notice the absence of (?s) DOTALL modifier here, it is not necessary here). The .*(?:\R(?!foo\d).*)* matches:

.* - the rest of the line
(?:\R(?!foo\d).*)* - zero or more consecutive sequences of:
- \R(?!foo\d) - any line break sequence (\R) that is not followed with foo and any digit
- .* - the rest of the line.

answered Sep 06 '17 at 08:29

Wiktor Stribiżew

607,720
39
448
563

Thank you Wiktor. I need to wait some minutes to accept the answer, but both solutions work perfectly. Thanks specially for including the explanation and the terminology. – David Sep 06 '17 at 08:33
Note that [unroll-the-loop approach](http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop) (the second one) is much more efficient and should be preferred. A plain tempered greedy token is only good when we are lazy or when using a variable, or backreference. – Wiktor Stribiżew Sep 06 '17 at 08:35
Nice, I can see in *regex101* that it actually needs less steps to achieve it. – David Sep 06 '17 at 08:42
Yeah, but the number of steps is not actually a direct evidence that one regex is more efficient than another. In this case, sure, the second one will be more efficient because of the unroll-the-loop technique. Remember that a regex performance should only be checked in the target environment. – Wiktor Stribiżew Sep 06 '17 at 08:45

Find numbers after last occurrence of keyword1 but still before keyword2

1 Answers1