3

I have the following string:

(some text) or ((other text) and (some more text)) and (still more text)

I would like a python regular expression that splits it up into

['(some text)', '((other text) and (some more text))', '(still more text)']

I've tried this but it doesn't work:

haystack = "(some text) or ((other text) and (some more text)) and (still more text)"
re.split('(or|and)(?![^(]*.\))', haystack) # no worky

Any help is appreciated.

Sid Kwakkel
  • 749
  • 3
  • 11
  • 31
  • 5
    Regex doesn't handle arbitrarily nested content very well. Beyond the example you showed us, there could be even more layers of nested parentheses. For this situation, using a parser might get you further than a regex. – Tim Biegeleisen Aug 01 '17 at 05:37
  • 2
    This may help: https://stackoverflow.com/questions/26633452/how-to-split-by-commas-that-are-not-within-parentheses – Christian Dean Aug 01 '17 at 05:37
  • This might also be useful: https://stackoverflow.com/questions/4284991/parsing-nested-parentheses-in-python-grab-content-by-level – perigon Aug 01 '17 at 05:44
  • (and|or)(?![^()]*\)) doesn't work either @Christian_Dean – Sid Kwakkel Aug 01 '17 at 05:46
  • How about use `"(\(.+?\)) or (\(.*\)) and (\(.+?\))"` to extract content from original string? – stamaimer Aug 01 '17 at 05:49

4 Answers4

3

This solution works for arbitrarily nested parentheses, which a regex couldn't (s is the original string):

from pyparsing import nestedExpr
def lst_to_parens(elt):
    if isinstance(elt,list):
        return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')'
    else:
        return elt

split = nestedExpr('(',')').parseString('(' + s + ')').asList()
split_lists = [elt for elt in split[0] if isinstance(elt,list)]
print ([lst_to_parens(elt) for elt in split_lists])

Output:

['(some text)', '((other text) and (some more text))', '(still more text)']

For OP's real test case:

s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')"

Output:

["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"]
perigon
  • 2,160
  • 11
  • 16
1

I would use re.findall instead of re.split. And note that this would work only upto the brackets of depth 2.

>>> import re
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)'
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s)
['(some text)', '((other text) and (some more text))', '(still more text)']
>>> 
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • I tried to simplify my string and it backfired. Your solution does not work for my real string... (substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle') – Sid Kwakkel Aug 01 '17 at 05:59
  • @user1571934 how about this https://regex101.com/r/zDmF3s/1 ? If the depth goes further long then drop the idea of using regex. Write your own parser. – Avinash Raj Aug 01 '17 at 06:01
1

You may also check this

import re
s = '(some text) or ((other text) and (some more text)) and (still more text)'
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s)
print(find_string)

output:

['(some text)', '((other text) and (some more text))', '(still more text)']

Edit

find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s)
R.A.Munna
  • 1,699
  • 1
  • 15
  • 29
0

You can try this re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)