My regex does not pick the closest 'cont' pair to the inner text. How can I fix that?
Input:
cont cont ItextI /cont /cont
Regex:
cont.*?I(.*?)I.*?/cont
Match:
cont cont ItextI /cont
Match I need:
cont ItextI /cont
My regex does not pick the closest 'cont' pair to the inner text. How can I fix that?
Input:
cont cont ItextI /cont /cont
Regex:
cont.*?I(.*?)I.*?/cont
Match:
cont cont ItextI /cont
Match I need:
cont ItextI /cont
cont(?:(?!/?cont).)*I(.*?)I(?:(?!/?cont).)*/cont
will only match the innermost block.
Explanation:
cont # match "cont"
(?: # Match...
(?!/?cont) # (as long as we're not at the start of "cont" or "/cont")
. # any character.
)* # Repeat any number of times.
I # Match "I"
(.*?) # Match as few characters as possible, capturing them.
I # Match "I"
(?: # Same as above
(?!/?cont)
.
)*
/cont # Match "/cont"
This explicitly forbids cont
or /cont
to appear between the opening cont
and the to-be-captured text (and between that text and the closing /cont
).
The reason you match on cont cont ItextI /cont
is that the regex matches the first part of your pattern cont
on the first "cont", then it uses the reluctant .*?
to gobble up the whitespace, next cont and whitespace preceding ItextI
. When it reached ItextI
, it recognizes the I
as matching the next part of the pattern, and continues with the rest of the regex. As minitech writes, this is because the regex is working from the beginning of the string and finding the earliest possible match.
If you can make assumptions about the whitespace, you can write:
cont\s+I(.*?)I\s+/cont
This will match in your example above.