There is a technique available which uses features from a more powerful regex implementation. Don't worry, it's backwards-compatible with the standard re
module. The basic idea is also possible in standard re
, but it's a bit more fiddly - I will outline the method for stdlib module at the end of this answer.
# pip install regex
import regex as re
s1 = 'type1/type2/type3'
s2 = 'type1/type2/type3(a/c)'
s3 = 'A/(B/C(D/E))/F'
s4 = 'A/(B/C(D/E))/F(a/c)'
Here's the pattern:
pat = r'\(.*?\)(*SKIP)(*FAIL)|/'
Demo:
>>> re.split(pat, s1)
['type1', 'type2', 'type3']
>>> re.split(pat, s2)
['type1', 'type2', 'type3(a/c)']
>>> re.split(pat, s3)
['A', '(B/C(D/E))', 'F']
>>> re.split(pat, s4)
['A', '(B/C(D/E))', 'F(a/c)']
How it works? Read the regex like this:
blacklisted(*SKIP)(*FAIL)|matched
This pattern first discards anything enclosed in non-greedy parens, i.e. \(.*?\)
, and that's where we used the (*SKIP)(*FAIL)
feature, which is not there in stdlib re
yet. Then it matches what's left on the righthand side of the |
, i.e. a slash.
As I mentioned, the technique is also possible in standard re
, but you have to use capture groups. The pattern will need a capture group surrounding the slash on the right side:
pat_ = r'\(.*?\)|(/)'
Group 1 will be set for the "good" matches. So iterating like this:
>>> for match in re.finditer(pat_, s):
... if match[1] is not None:
... print(match.start())
Will print out the indices that you need to split at. It's trivial then to split the string programmatically. You can actually do it in regex directly with using re.sub
and re.split
, but it's cleaner and easier just to do the split in Python code directly once you have the indices.