2

My string looks like this:

string = "*[EQ](@[Type],'A,B,C',@[Type],*[EQ](@[Type],D,E,F))"

The ideal output list is:

['@[Type]', 'A,B,C', '@[Type]', '*[EQ](@[Type],D,E,F)']

So I can parse the string as:

if @[Type] in ('A,B,C') then @[Type] else *[EQ](@[Type],D,E,F)

The challenge is to find all the commas followed by @, ' or *. I've tried the following code but it doesn't work:

interM = re.search(r"\*\[EQ\]\((.+)(?=,@|,\*|,\')+,(.+)\)", string)
print(interM.groups())

Edit:

The ultimate goal is to parse out the 4 components of the input string:

*[EQ](Value, Target, ifTrue, ifFalse)
Nip
  • 359
  • 1
  • 3
  • 9
  • 2
    where and why did the first `*EQ(` go? – vks Jun 19 '15 at 04:49
  • Your input seems to be a nested structure, which is something regular expressions are not the right tool for. Please either: explain its grammar in abstract terms or: show and confirm all variations you are dealing with. – Tomalak Jun 19 '15 at 04:51
  • @Tomalak more info added. can you explain a bit more why RE might not be a good tool for nested structure? sorry I'm new to python and this question seems a bit silly... – Nip Jun 19 '15 at 05:03
  • What is the expected behavior if there is a missing comma between expressions? – Joel Cornett Jun 19 '15 at 05:17
  • @Nip Regular expressions are incapable of capturing nesting. It's a technical limitation. If your `ifFalse` can contain unlimited levels, regex will be unable to capture the correct closing quotes. See here for more explanations and alternatives: http://stackoverflow.com/questions/1099178/matching-nested-structures-with-regular-expressions-in-python. There are any number of posts that discuss regex and nested input, if you search around. – Tomalak Jun 19 '15 at 06:34

2 Answers2

2
x="*[EQ](@[Type],'A,B,C',@[Type],*[EQ](@[Type],D,E,F))"
print re.findall(r"@[^,]+|'[^']+'|\*.*?\([^\)]*\)",re.findall(r"\*\[EQ\]\((.*?)\)$",x)[0])

Output:

['@[Type]', "'A,B,C'", '@[Type]', '*[EQ](@[Type],D,E,F)']

You can try something of this sort.You have not mentioned the logic or anything so not sure if this can be scaled.

vks
  • 67,027
  • 10
  • 91
  • 124
  • Hi! Thanks for the help! I add more information in the question, not sure if it clarifies though. – Nip Jun 19 '15 at 05:02
2
>>> import re
>>> string = "*[EQ](@[Type],'A,B,C',@[Type],*[EQ](@[Type],D,E,F))"
>>> re.split(r"^\*\[EQ\]\(|\)$|,(?=[@'*])", string)[1:-1]
['@[Type]', "'A,B,C'", '@[Type]', '*[EQ](@[Type],D,E,F)']

Although, if you are looking for a more robust solution I'd highly recommend a Lexical Analyzer such as flex.

gbrener
  • 5,365
  • 2
  • 18
  • 20