I'm fairly new to python and I'm trying to understand I do I loop regex groups, ex:
reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
result = reobj.findall(body)
how do I loop the 2 groups from the regex ? Thanks!
I'm fairly new to python and I'm trying to understand I do I loop regex groups, ex:
reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
result = reobj.findall(body)
how do I loop the 2 groups from the regex ? Thanks!
Did you actually try this in the shell?
>>> body = """<a href="http://foo.com">Foo</a><br><a href="http://bar.com">Bar</a>"""
>>> reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
>>> result = reobj.findall(body)
>>> result
[('http://foo.com', 'Foo'), ('http://bar.com', 'Bar')]
So the result of findall
is simply a list of tuples containing the matched groups. If you don't know how to iterate through a list, then you need to do an introductory Python tutorial.
[insert standard rant about how you shouldn't use regex to parse HTML here...]
The answer I needed was:
reobj = re.compile('<a href="(.*?)">(.*?)</a>', re.IGNORECASE)
result = reobj.findall(body)
for link in result:
print link[0] + link[1]