-3

I search regex for split string with comma only not between parenthesis in python:

Exemple:

string = '(parent son, daugther  , father ), sister'

expected result:

['(parent son, daugther  , father )', 'sister']

Thanks for your help

mohd4482
  • 1,788
  • 14
  • 25

3 Answers3

0

Generally speaking, regex are not good at matching nesting / recursive structures. So while it might be possible to succeed, you'd have a much easier time doing the splitting by hand e.g.

groups = []
nesting = 0
idx = 0
for group in re.finditer(r'[,\(\)]', string):
    assert nesting >= 0
    if group[0] == '(':
        nesting += 1
    elif group[0] == ')':
        nesting -= 1
    elif nesting > 0:
        continue # ignore commas in parens
    else:
        groups.append(string[idx:group.start()].strip())
        idx = group.end()
# after last group
groups.append(string[idx:].strip())
Masklinn
  • 34,759
  • 3
  • 38
  • 57
0

For your specific example, I would go with regex split using positive lookbehind i.e splitting the string at comma (and spaces after if any) that is preceded by closing round bracket:

import re

string = '(parent son, daugther  , father ), sister'

output = re.split(r'(?<=\)),\s+', string)
# ['(parent son, daugther  , father )', 'sister']
mohd4482
  • 1,788
  • 14
  • 25
0

Using regex instead of re, you can use (*SKIP)(*FAIL):

import regex

str = '(parent son, daugther  , father ), sister'
res = regex.split(r'\(.+?\)(*SKIP)(*FAIL)|,', str)
print(res)

Output:

['(parent son, daugther  , father )', ' sister']

How it works:

  • \(.+?\) is trying to match opening and closing parenthesis with some data inside
  • (*SKIP)(*FAIL) if parens are found, then discard the match
  • | else
  • , match a comma. At this point, we are sure it is not between parens.
Toto
  • 89,455
  • 62
  • 89
  • 125