2

I have this python script. That uses some regular expression. I want to split the string s, but commas while ignoring any commas that exists within the brackets.

s = """aa,bb,(cc,dd),m(ee,ff)"""
splits = re.split(r'\s*(\([^)]*\)|[^,]+)', s, re.M|re.S)
print('\n'.join(splits))
Actual output:
    aa
    ,
    bb
    ,
    (cc,dd)
    ,
    m(ee
    ,
    ff)
Desired output: 
    aa
    bb
    (cc,dd)
    m(ee,ff)

So I can't make it handle having text outside the brackets. Was hoping someone could help me out.

h33
  • 1,104
  • 3
  • 16
  • 29

4 Answers4

2

You may use this regex with a lookahead for split:

>>> s = """aa,bb,(cc,dd),m(ee,ff)"""
>>> print ( re.split(r',(?![^()]*\))', s) )
['aa', 'bb', '(cc,dd)', 'm(ee,ff)']

RegEx Demo

RegEx Details:

  • ,: Match a comma
  • (?![^()]*\)): A negative lookahead assertion that makes sure we don't match comma inside (...) by asserting that there is no ) ahead after 0 or more not bracket characters.
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Consider using findall instead - repeat a group that matches (s followed by non-) characters, followed by ), or matches non-, characters:

s = """aa,bb,m(cc,dd)"""
matches = re.findall(r'(?:\([^(]+\)|[^,])+', s, re.M|re.S)
print('\n'.join(matches))

If speed is an issue, you can make it a bit more efficient by putting ( in the other negative character set, and alternating it first:

(?:[^(,]+|\([^(]+\))+
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Check no of steps it takes on https://regex101.com/r/UXdHRe/2 (272 steps) vs my suggested split regex (146 steps). – anubhava Mar 12 '19 at 06:47
0

I needed to do something similar, but I also had nested brackets. The proposed regex expressions do NOT handle nesting.

I couldn't find a regex solution, but here is a python function solution that achieves the same thing:

def comma_split(text: str) -> list[str]:
    flag = 0
    buffer = ""
    result = []
    for char_ in text:
        if char_ == "[":
            flag += 1
        elif char_ == "]":
            flag -= 1
        elif char_ == "," and flag == 0:
            result.append(buffer)
            buffer = ""
            continue
        buffer += char_
    if buffer:
        result.append(buffer)
    return result
basil_man
  • 322
  • 5
  • 14
-1

try : r',([^,()][(][^()][)][^,])|([^,]+)'

tested on regex101 : https://regex101.com/r/pJxRwQ/1

shikai ng
  • 137
  • 3