0

Possible Duplicate:
Python regular expressions - how to capture multiple groups from a wildcard expression?
python regex of group match

I know there are better or easier ways to do this, but as I tried it myself and it did not work I am interested why, so here is the problem:

Assume I want to get Xml attributes with a regex. Lets look at the following XML-Node:

<?xml version="1.0" encoding="UTF-8"?> 
<Node key1="val1" key2="val2">
    <OtherNode>
        <!-- something -->
    </OtherNode>
</Node>

to parse the Node as well as OtherNode I have the following regex:

import re
pattern=re.compile
('\s*?<(?P<key>[\w\d]+?)
  \s*?(?P<meta>(?P<metakey>[\w:]+?)="(?P<metavar>.+?)"\s*)*>')

the output of pattern.findall(xml) is:

('Node', 'key2="val2"', 'key2', 'val2') ('OtherNode', '', '', '')

and the output of [m.groupdict() for m in pattern.finditer(xml)]:

{'metakey': 'key2', 'meta': 'key2="val2"', 'metavar': 'val2', 'key': 'Node'}
{'metakey': None, 'meta': None, 'metavar': None, 'key': 'OtherNode'}

It seems like only the last metavar is accesible as group.

How to match key1 as well as key2? Isn't it possible to match more than one group with the (...)* construct? In other words: I want the regex to match the named group meta more than once if present.

Community
  • 1
  • 1
Rafael T
  • 15,401
  • 15
  • 83
  • 144
  • 3
    If you want to _parse xml_, considering using a XML parser like [`lxml`](http://lxml.de/) – Burhan Khalid Sep 02 '12 at 13:48
  • Thats what I mentioned first: I KNOW there are xml parsers, just wondering why I cannot match a group more than once – Rafael T Sep 02 '12 at 13:49
  • 4
    Check this answer: http://stackoverflow.com/questions/464736/python-regular-expressions-how-to-capture-multiple-groups-from-a-wildcard-expr – John P Sep 02 '12 at 13:55
  • @hayden you ever heard of lazyQuantifiers? `*` is a `greedy` quantifier where `*?` and `+?`are indicated lazy by the `?` after them. read the docs: http://docs.python.org/library/re.html – Rafael T Sep 02 '12 at 22:31

0 Answers0