This question is not related to variable length look-behind as it probably has a solution without negative look-behind.
In Python3, I am trying to match a pattern that is probably at the limit of what can be achieved with a regexp, but I still want to give it a try. I am actually trying to avoid using a parsing tool.
What I want to match is the pattern that indicate a regexp set. So the following would be matched.
[abc]
[1-9\n\t]
[ \t\]]
[\\\]]
[[\\\\\\\]]
The square brackets cannot be nested, by example in [[]]
, we want to match [[]
.
Although, since a \]
indicate an escaped bracket, we need to skip those. But a pattern such as \\]
must be accepted. The following would not be matched.
[\]
[\\\]
[abc\\\]
The rule ends up being match from [
to the first ]
that is not preceded by an odd amount of \
.
It seems negative lookbehind does not work because it must have fixed length.
Edit: An interesting solution was given by Wiktor Stribiżew
re.compile(r'\[[^]\\]*(?:\\.[^]\\]*)*\]')
Edit: Simpler version of the above by Rawing
r'\[(?:\\.|[^]\\])*\]'