Forget regex. Accomplishing what you want to do with a regex is going to be error prone and unreliable. You're always going to have little edge cases that you can't really handle well with a regex.
What you really need is a context free grammar. Use pyparsing
.
>>> from pyparsing import OneOrMore, Regex, Optional
>>> pairListParser = OneOrMore(u'Key=' + Regex(u'[^,]+') + u',Value=' + Regex(u'[^, ]+') + Optional(Regex(u',? ')))
>>> x = u'Key=key1,Value={"a=":"b"}, Key=key2,Value=value2, Key=key3,Value={"c":{"d":"e"}}'
>>> pairListParser.parseString(x, parseAll=True)
([u'Key=', u'key1', u',Value=', u'{"a=":"b"}', u', ', u'Key=', u'key2', u',Value=', u'value2', u', ', u'Key=', u'key3', u',Value=', u'{"c":{"d":"e"}}'], {}
Note that in the example above, I assumed that keys cannot contain a comma (,
) and that values cannot contain a comma (,
) or a space (
). I did so for simplicity, but with pyparsing
, you can rework the parser to allow for those cases. It's just a matter of doing the work to figure it out, whereas with regex, it is mathematically impossible to parse it if those restrictions don't apply.
Then you just need to pull out the results.
>>> parsedX = pairListParser.parseString(x, parseAll=True)
>>> parsedXIter = iter(i for i in parsedX if i not in (u'Key=', u',Value=', u', '))
>>> result = dict(zip(parsedXIter, parsedXIter))
>>> result
{u'key3': u'{"c":{"d":"e"}}', u'key2': u'value2', u'key1': u'{"a=":"b"}'}
(There are probably better ways to pull out the results, but this was quick and dirty. Noteably, pyparsing
has capabilities that let you discard certain elements or transform the results while it parses.)
Once you have the results in a dict
, you can do whatever you want with the values:
for k, v in result.items():
m = re.match(u'^{(.+)}$', v)
if m:
print(m.groups())
I imagine it would be better to parse them as JSON or something like that, but the point is now you've cut off all the stuff around the value and can work with just the value in isolation.