0

I need to do screen scraping and for that I need to read some xml from python. I want to get a proper DOM tree out of it. How can I do that?

Reinout van Rees
  • 13,486
  • 2
  • 36
  • 68
simran
  • 1

2 Answers2

1

Check out the minidom package which also has examples.

BTW if your screen scraping is HTML don't use XML parsing. There's other stuff for that. (Question about screen scraping, Question about python HTML screen scraping).

Community
  • 1
  • 1
extraneon
  • 23,575
  • 2
  • 47
  • 51
  • And there's other, more comfortable stuff than minidom, mainly the ElementTree API :) –  Jul 12 '10 at 18:37
  • 1
    I happen to like minidom. It works, and it's always included in Python. Easier to get it deployed that way, especially if the servers are maintained by others. – extraneon Jul 12 '10 at 18:39
0

The lxml library works well for scraping HTML. Here are some links to get you started:

ars
  • 120,335
  • 23
  • 147
  • 134