I search regex for split string with comma only not between parenthesis in python:
Exemple:
string = '(parent son, daugther , father ), sister'
expected result:
['(parent son, daugther , father )', 'sister']
Thanks for your help
I search regex for split string with comma only not between parenthesis in python:
Exemple:
string = '(parent son, daugther , father ), sister'
expected result:
['(parent son, daugther , father )', 'sister']
Thanks for your help
Generally speaking, regex are not good at matching nesting / recursive structures. So while it might be possible to succeed, you'd have a much easier time doing the splitting by hand e.g.
groups = []
nesting = 0
idx = 0
for group in re.finditer(r'[,\(\)]', string):
assert nesting >= 0
if group[0] == '(':
nesting += 1
elif group[0] == ')':
nesting -= 1
elif nesting > 0:
continue # ignore commas in parens
else:
groups.append(string[idx:group.start()].strip())
idx = group.end()
# after last group
groups.append(string[idx:].strip())
For your specific example, I would go with regex split using positive lookbehind i.e splitting the string at comma (and spaces after if any) that is preceded by closing round bracket:
import re
string = '(parent son, daugther , father ), sister'
output = re.split(r'(?<=\)),\s+', string)
# ['(parent son, daugther , father )', 'sister']
Using regex instead of re, you can use (*SKIP)(*FAIL)
:
import regex
str = '(parent son, daugther , father ), sister'
res = regex.split(r'\(.+?\)(*SKIP)(*FAIL)|,', str)
print(res)
Output:
['(parent son, daugther , father )', ' sister']
\(.+?\)
is trying to match opening and closing parenthesis with some data inside(*SKIP)(*FAIL)
if parens are found, then discard the match|
else,
match a comma. At this point, we are sure it is not between parens.