0

When parsing this string:

import re
s = 'hello how are you? {{foo;;[[1;;2;;3]];;bar;;[[0;;2;;3]]}} im okay {{ABC;;DEF;;[[10;;11]]}}'
for m in re.findall(r'{{(.*?)}}', s):
    print('curly brackets: ', m)
    L = m.split(';;')
    print(L) 

The m.split(';;') should give this:

['foo', '[[1;;2;;3]]', 'bar', '[[0;;2;;3]]']

instead of:

['foo', '[[1', '2', '3]]', 'bar', '[[0', '2', '3]]']

How to modify the split to do this?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 1
    See https://ideone.com/Mz6NxU – Wiktor Stribiżew Apr 13 '20 at 19:10
  • Regular expressions are not ideal parsers. You can many arbitrarily complicated things (see https://github.com/Davidebyzero/RegexGolf/blob/master/regex%20for%20matching%20multiplication%20-%20factoring%20method.txt) but the question better suited for code golf than stackoverflow. – Cireo Apr 13 '20 at 19:12
  • Thanks a lot! Since it's a lit bit different than the duplicate @WiktorStribiżew, could you post it as an answer? It would be great for future reference. – Basj Apr 13 '20 at 19:12
  • (@WiktorStribiżew Comments tend to be cleared sometimes) – Basj Apr 13 '20 at 19:12
  • It is always the same: `char` + `(?![^OPEN-CLOSE-DEL-CHARS]*DEL-CHAR)`. No need to multiply the same kind of knowledge. – Wiktor Stribiżew Apr 13 '20 at 19:19
  • 1
    @WiktorStribiżew For such complicated things, multiple examples are often welcome to understand the concept. – Basj Apr 13 '20 at 19:20

1 Answers1

3

You may use this split with a negative lookahead:

L = re.split(r';;(?![^[]*])', m)

Here it will split on ;; with a negative lookahead (?![^[]*]) which means on right hand side there should not be a ] after 0 or more non-[ characters, thus ignore matching ;; inside [...].

Note that this assumes [ and ] are balanced and unescaped.

anubhava
  • 761,203
  • 64
  • 569
  • 643