Possible Duplicate:
Python regular expressions - how to capture multiple groups from a wildcard expression?
python regex of group match
I know there are better or easier ways to do this, but as I tried it myself and it did not work I am interested why, so here is the problem:
Assume I want to get Xml attributes with a regex. Lets look at the following XML-Node:
<?xml version="1.0" encoding="UTF-8"?>
<Node key1="val1" key2="val2">
<OtherNode>
<!-- something -->
</OtherNode>
</Node>
to parse the Node
as well as OtherNode
I have the following regex:
import re
pattern=re.compile
('\s*?<(?P<key>[\w\d]+?)
\s*?(?P<meta>(?P<metakey>[\w:]+?)="(?P<metavar>.+?)"\s*)*>')
the output of pattern.findall(xml)
is:
('Node', 'key2="val2"', 'key2', 'val2') ('OtherNode', '', '', '')
and the output of [m.groupdict() for m in pattern.finditer(xml)]
:
{'metakey': 'key2', 'meta': 'key2="val2"', 'metavar': 'val2', 'key': 'Node'}
{'metakey': None, 'meta': None, 'metavar': None, 'key': 'OtherNode'}
It seems like only the last metavar is accesible as group.
How to match key1
as well as key2
? Isn't it possible to match more than one group with the (...)*
construct? In other words: I want the regex to match the named group meta
more than once if present.