1

I need to parse html emails that will be similar but not exactly the same. I will be looking for things like dates, amounts, vendors, ect., but depending on who the email came from, the markup will be different.

How could I parse out those common things from lots of different html markup in python?

Thanks for your suggestions.

Sam
  • 1,741
  • 5
  • 18
  • 22

3 Answers3

7

You absolutely need to consider Beautiful Soup library.

bioffe
  • 6,283
  • 3
  • 50
  • 65
2

You can use Beautiful Soup to parse HTML in Python.

nmichaels
  • 49,466
  • 12
  • 107
  • 135
  • @downvoter: Are you trying to get a badge for downvoting everything or something? The link's not dead and you didn't leave a comment. I'm assuming the same person downvoted all 3 answers here. – nmichaels Oct 20 '14 at 21:44
2

BeautifulSoup or lxml are decent HTML parsers. BeautifulSoup is a bit more handy but has some odds and ends.