python splitting by 'and' and 'or', but not in parentheses

Question

I have the following string:

(some text) or ((other text) and (some more text)) and (still more text)

I would like a python regular expression that splits it up into

['(some text)', '((other text) and (some more text))', '(still more text)']

I've tried this but it doesn't work:

haystack = "(some text) or ((other text) and (some more text)) and (still more text)"
re.split('(or|and)(?![^(]*.\))', haystack) # no worky

Any help is appreciated.

Regex doesn't handle arbitrarily nested content very well. Beyond the example you showed us, there could be even more layers of nested parentheses. For this situation, using a parser might get you further than a regex. — Tim Biegeleisen, Aug 01 '17 at 05:37
This may help: https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses — Christian Dean, Aug 01 '17 at 05:37
This might also be useful: https://stackoverflow.com/questions/4284991/parsing-nested-parentheses-in-python-grab-content-by-level — perigon, Aug 01 '17 at 05:44
How about use `"(\(.+?\)) or (\(.*\)) and (\(.+?\))"` to extract content from original string? — stamaimer, Aug 01 '17 at 05:49

perigon · Answer 1 · 2017-08-01T06:02:26.910

This solution works for arbitrarily nested parentheses, which a regex couldn't (s is the original string):

from pyparsing import nestedExpr
def lst_to_parens(elt):
    if isinstance(elt,list):
        return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')'
    else:
        return elt

split = nestedExpr('(',')').parseString('(' + s + ')').asList()
split_lists = [elt for elt in split[0] if isinstance(elt,list)]
print ([lst_to_parens(elt) for elt in split_lists])

Output:

['(some text)', '((other text) and (some more text))', '(still more text)']

For OP's real test case:

s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')"

Output:

["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"]

Avinash Raj · Accepted Answer · 2017-08-01T06:03:41.707

1

I would use re.findall instead of re.split. And note that this would work only upto the brackets of depth 2.

>>> import re
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)'
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s)
['(some text)', '((other text) and (some more text))', '(still more text)']
>>>

edited Aug 01 '17 at 06:03

answered Aug 01 '17 at 05:51

Avinash Raj

172,303
28
230
274

I tried to simplify my string and it backfired. Your solution does not work for my real string... (substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle') – Sid Kwakkel Aug 01 '17 at 05:59
@user1571934 how about this https://regex101.com/r/zDmF3s/1 ? If the depth goes further long then drop the idea of using regex. Write your own parser. – Avinash Raj Aug 01 '17 at 06:01

R.A.Munna · Answer 3 · 2017-08-01T07:44:52.030

1

You may also check this

import re
s = '(some text) or ((other text) and (some more text)) and (still more text)'
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s)
print(find_string)

output:

['(some text)', '((other text) and (some more text))', '(still more text)']

Edit

find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s)

edited Aug 01 '17 at 07:44

answered Aug 01 '17 at 06:02

R.A.Munna

1,699
1
15
29

this is not the right way of matching the brackets.. what if there any text present between two open brackets? – Avinash Raj Aug 01 '17 at 06:05
@AvinashRaj, will you please give a sample string? Thanks. – R.A.Munna Aug 01 '17 at 06:22
check your regex with this `'(some text) or ( (other text) and (some more text)) and (still more text)'` string.. – Avinash Raj Aug 01 '17 at 06:24
yes got it. is `r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]'` okey? I get the output right for this regex. – R.A.Munna Aug 01 '17 at 06:32

score 0 · Answer 4 · answered Aug 01 '17 at 05:47

0

You can try this re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)

answered Aug 01 '17 at 05:47

R Palanivel-Tamilnadu India

442
2
7

python splitting by 'and' and 'or', but not in parentheses

4 Answers4