Got an interesting problem here for everyone to consider:
I am trying to parse and tokenize strings delimited by a "/"
character but only when not in between parenthesis.
For instance:
Root/Branch1/branch2/leaf
Should be tokenized as: "Root"
, "Branch1"
, "Branch2"
, "leaf"
Root/Branch1(subbranch1/subbranch2)/leaf
Should be tokenized as: "Root"
, "Branch1(subbranch1,subbranch2)"
, "leaf"
Root(branch1/branch2) text (branch3/branch4) text/Root(branch1/branch2)/Leaf
Should be tokenized as: "Root(branch1/branch2) text(branch3/branch4)"
, "Root(branch1/branch2)"
, "leaf"
.
I came up with the following expression which works great for all cases except ONE!
([^/()]*\((?<=\().*(?=\))\)[^/()]*)|([^/()]+)
The only case where this does not work is the following test condition:
Root(branch1/branch2)/SubRoot/SubRoot(branch3/branch4)/Leaf
This should be tokenized as: "Root(branch1/branch2)"
, "SubRoot"
, "SubRoot(branch3/branch4)"
, "Leaf"
The result I get instead consists of only one group that matches the whole line so it is not tokenizing it at all:
"Root(branch1/branch2)/SubRoot/SubRoot(branch3/branch4)/Leaf"
What is happening here is that because Regex is greedy it is matching the left most opening parenthesis "("
with the last closing parenthesis ")"
instead of just knowing to stop at its appropriate delimiter.
Any of you Regex gurus out there can help me figure out how to add a small Regex piece to my existing expression to handle this additional case?
Root(branch1/branch2) Test (branch3/branch4)/SubRoot/SubRoot(branch5/branch6)/Leaf
Should be tokenized into groups as:
"Root(branch1/branch2) Test (branch3/branch4)" "SubRoot" "SubRoot(branch5/branch6)" "Leaf"