python regex groups

Question

I'm fairly new to python and I'm trying to understand I do I loop regex groups, ex:

reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
result = reobj.findall(body)

how do I loop the 2 groups from the regex ? Thanks!

What do you mean, "loop the groups"? What are you trying to achieve? — Daniel Roseman, Aug 07 '11 at 09:19
the regex matches 2 groups (.*?) and I want to loop both groups. — Pedro Lobito, Aug 07 '11 at 09:22
Considering your example, you may also be interested in http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Thomas Wouters, Aug 07 '11 at 10:00

score 6 · Answer 1 · answered Aug 07 '11 at 09:26

Did you actually try this in the shell?

>>> body = """<a href="http://foo.com">Foo</a><br><a href="http://bar.com">Bar</a>"""
>>> reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
>>> result = reobj.findall(body)
>>> result
[('http://foo.com', 'Foo'), ('http://bar.com', 'Bar')]

So the result of findall is simply a list of tuples containing the matched groups. If you don't know how to iterate through a list, then you need to do an introductory Python tutorial.

[insert standard rant about how you shouldn't use regex to parse HTML here...]

Pedro Lobito · Accepted Answer · 2011-08-15T22:18:29.137

1

The answer I needed was:

reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
result = reobj.findall(body)


for link in result:
        print link[0] + link[1]

edited Aug 15 '11 at 22:18

answered Aug 15 '11 at 22:13

Pedro Lobito

94,083
31
258
268

python regex groups

2 Answers2