Parse a string with two delimiters [[, {{

Question

When parsing this string:

import re
s = 'hello how are you? {{foo;;[[1;;2;;3]];;bar;;[[0;;2;;3]]}} im okay {{ABC;;DEF;;[[10;;11]]}}'
for m in re.findall(r'{{(.*?)}}', s):
    print('curly brackets: ', m)
    L = m.split(';;')
    print(L)

The m.split(';;') should give this:

['foo', '[[1;;2;;3]]', 'bar', '[[0;;2;;3]]']

instead of:

['foo', '[[1', '2', '3]]', 'bar', '[[0', '2', '3]]']

How to modify the split to do this?

Regular expressions are not ideal parsers. You can many arbitrarily complicated things (see https://github.com/Davidebyzero/RegexGolf/blob/master/regex%20for%20matching%20multiplication%20-%20factoring%20method.txt) but the question better suited for code golf than stackoverflow. — Cireo, Apr 13 '20 at 19:12
Thanks a lot! Since it's a lit bit different than the duplicate @WiktorStribiżew, could you post it as an answer? It would be great for future reference. — Basj, Apr 13 '20 at 19:12
It is always the same: `char` + `(?![^OPEN-CLOSE-DEL-CHARS]*DEL-CHAR)`. No need to multiply the same kind of knowledge. — Wiktor Stribiżew, Apr 13 '20 at 19:19
@WiktorStribiżew For such complicated things, multiple examples are often welcome to understand the concept. — Basj, Apr 13 '20 at 19:20

score 3 · Accepted Answer · answered Apr 13 '20 at 19:13

You may use this split with a negative lookahead:

L = re.split(r';;(?![^[]*])', m)

Here it will split on ;; with a negative lookahead (?![^[]*]) which means on right hand side there should not be a ] after 0 or more non-[ characters, thus ignore matching ;; inside [...].

Note that this assumes [ and ] are balanced and unescaped.

Parse a string with two delimiters [[, {{

1 Answers1