I will start with example as it might be the easiest explanation. We have a multi-line file:
...
STARTING LINE with something 83
...
STARTING LINE with other 12
...
ENDING LINE with yet another info
...
STARTING LINE with another 43
...
The ...
means anything (multiple lines including empty lines) except STARTING LINE .*
and ENDING LINE .*
.
We have to capture groups containing all STARTING LINE .*
that are not followed by ENDING LINE .*
which means the first and the last occurrence of STARTING LINE .*
in the example.
The number of occurrences of STARTING LINE .*
alone and STARTING LINE .*...ENDING LINE .*
pairs is not known.
I have tried multiple expressions with positive and negative, forward and backward lookaheads, but never managed to capture occurrences properly.
I can provide more examples if needed, but it might be hard to give you the expressions I've already tried as I didn't keep track of them and the current ones captures all occurrences, including the one we don't want:
(^STARTING LINE .*?$)(?!^ENDING LINE)[.\n]+
(^STARTING LINE .*?$(?!.*^ENDING LINE)[.\n]*)
Note that we want to have only the STARTING LINE .*
lines in a group.
We use Python 2.7 regex engine with re.MULTILINE
flags (gm
). Tried also with additional re.DOTALL
(s
) option with no success.