0

sample input

 a = '(abc) * (j+2) * (abs(k)) * (log(sum(l)))'

sample output

['abc','j+2','abs(k)','log(sum(l))']

I tried using this

g = re.findall(r'\((.+?)\)',a)

the output I'm getting

['abc', 'j+2', 'abs(k', 'log(sum(l']

I cant figure out how to make it skip all the nested parantheses

EDIT 1: I guess it's easy using split method but out of curiosity how can it be done using regular expressions?

Anirudh Bandi
  • 1,171
  • 10
  • 20
  • Use [parser](https://devdocs.io/python~2.7/library/parser) library – hjpotter92 Aug 04 '17 at 16:16
  • 3
    This cannot be done with a regular expression. Regular expressions have no way to count the number of left parens to match the number of right parens. What you need is a parser. – pat Aug 04 '17 at 16:18
  • 1
    If you can complete the task without regex, then you should do it without regex, they're much harder to refactor and find bugs (the Verbose ones too). – d2718nis Aug 04 '17 at 16:21

6 Answers6

2

If you can assume that there will always be a space only after the first level parenthesis, then this would work:

\((.+?)\)(?= )

What this regex does is unless there is a space after the match, it doesn't accept it.

Another possibility is if you assume that inner brackets will always have another closing bracket after them. In this case, the following will work:

\((.+?)\)(?!\))

What this does is it makes sure that there isn't a closing bracket immediately after the match.

However, both of these approaches make some assumptions that may not be true. If this is the case, then it is impossible to do this with normal regex.
Refer to this question: Can regular expressions be used to match nested patterns?

The reason it is impossible is that regex is based on Finite State Automata. They are finite, and the only 'memory' they have is the state they are in. This means to count nested parentheses, you would need enough states to be able to store the number of nested parentheses. If there is no limit, you could have an infinite number, which goes against the basic concept.

Some regex implementations have, however, begun to include recursive expressions, which would solve this problem, for example PCRE, the regex engine for PHP. See http://php.net/manual/en/regexp.reference.recursive.php

Kaamil Jasani
  • 464
  • 5
  • 11
0

For this case you can use:

a = '(abc) * (j+2) * (abs(k)) * (log(sum(l)))'
print([c[1:-1] for c in a.split(' * ')])
# ['abc', 'j+2', 'abs(k)', 'log(sum(l))']
d2718nis
  • 1,279
  • 9
  • 13
0
In[60]: a = '(abc) * (j+2) * (abs(k)) * (log(sum(l)))'
        a[1:-1].split(') * (')

Out[60]: ['abc', 'j+2', 'abs(k)', 'log(sum(l))']

You can try this

whackamadoodle3000
  • 6,684
  • 4
  • 27
  • 44
0

Try this:

a = '(abc) * (j+2) * (abs(k)) * (log(sum(l)))' 
regex = re.compile(r'\)\s*[*|+|/|-]\s*\(')
b = regex.split(a[1:-1])
print b

Out: ['abc', 'j+2', 'abs(k)', 'log(sum(l))']

The benefit here is that you will be able to add other operators if you wish (+, -, *, /).

Note: This will only work if you don't have those operators nested in parenthesis. (Ex. ((a+b)*c) will fail)

digitaLink
  • 458
  • 3
  • 17
0

Group 2 with this seems to work:

(^|.*?[^\(])\((.*?)\)([^\)].*|$)
miraliu
  • 31
  • 3
0

Something like this using look around

a = '(abc) * (j+2) * (abs(k)) * (log(sum(l)))'
list( zip(*re.findall(r'\((.+?)\)(?=( |$))',a)) )[0]

Output:

('abc', 'j+2', 'abs(k)', 'log(sum(l))')
Transhuman
  • 3,527
  • 1
  • 9
  • 15