I'm new at Python and I need a regular expression to retrieve the title and the link of this format:
<a href="anything" class="anything" title="Size: anything">anything</a>
I'm new at Python and I need a regular expression to retrieve the title and the link of this format:
<a href="anything" class="anything" title="Size: anything">anything</a>
You'd be much better off using a decent HTML Parser. Use BeautifulSoup which has extensive documentation - for example:
from bs4 import BeautifulSoup
soup = BeautifulSoup(input)
for link in soup.find_all('a', class_='anything'):
print link['href'], link.text
This finds all <a>
elements with the class anything
, then prints their URL and link text.
Regular expressions are usually not the tool for parsing HTML.