The \[.*?\]\[2\]
pattern works like this:
\[
- finds the leftmost [
(as the regex engine processes the string input from left to right)
.*?
- matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\]
- ][2]
substring.
So, the .*?
gets expanded upon each failure until it finds the leftmost ][2]
. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*?
(or .*
) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*?
is replaced with [^\]\[]*
- 0 or more chars other than ]
and [
.
Other examples:
- Strings between angle brackets:
<[^<>]*>
matches <...>
with no <
and >
inside
- Strings between parentheses:
\([^()]*\)
matches (...)
with no (
and )
inside
- Strings between double quotation marks:
"[^"]*"
matches "..."
with no "
inside
- Strings between curly braces:
\{[^{}]*}
matches "..."
with no "
inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?
. To match abc 1 def
in abc 0 abc 1 def
, use abc(?:(?!abc).)*?def
.