-3

I'm new at Python and I need a regular expression to retrieve the title and the link of this format:

<a href="anything" class="anything" title="Size: anything">anything</a>
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
miamia
  • 1
  • 3
  • http://stackoverflow.com/a/1732454/2032663 –  Apr 16 '13 at 09:56
  • 3
    Except of course that you wouldn't be trying to parse HTML with regex, would you? No-one would do that :-) – Daniel Roseman Apr 16 '13 at 09:56
  • 1
    The reason you're getting downvoted isn't that the question itself is badly written. Rather, the very presence of this question shows a lack of prior research on your part; This question is asked/answered a thousand times. HTML is not regular, and hence Regex is not the right technology to handle it. Further, you haven't given an indication of what you've actually tried. – Sepster Apr 16 '13 at 10:04

1 Answers1

4

You'd be much better off using a decent HTML Parser. Use BeautifulSoup which has extensive documentation - for example:

from bs4 import BeautifulSoup

soup = BeautifulSoup(input)

for link in soup.find_all('a', class_='anything'):
    print link['href'], link.text

This finds all <a> elements with the class anything, then prints their URL and link text.

Regular expressions are usually not the tool for parsing HTML.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • It is not parsing Guys I just need to retrieve General Infos. and not specific anything = any string and not "anything" but itself – miamia Apr 16 '13 at 10:03
  • @MennouchiAzeddineIslam: This is an *example*. You can easily adjust it to your specific situation. Remove the `class_='anything'` filter for example. And yes, you have a parsing task at hand. – Martijn Pieters Apr 16 '13 at 10:04
  • Thank You just one more question and for the title field how can I do that I mean title="Size: anything" – miamia Apr 16 '13 at 10:05
  • @MennouchiAzeddineIslam: `link['title']`. I linked to the BeautifulSoup website, there is excellent documentation there to be found. – Martijn Pieters Apr 16 '13 at 10:07