6

I know how to parse a page using Python. My question is which is the fastest method of all parsing techniques, how fast is it from others?

The parsing techniques I know are Xpath, DOM, BeautifulSoup, and using the find method of Python.

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
codersofthedark
  • 9,183
  • 8
  • 45
  • 70
  • 5
    Pick a web page. Use the `timeit` module to test the execution times of the various mechanisms as they parse your selected source. Report which one is fastest. – larsks Dec 01 '11 at 13:54
  • Ha ha I think now I would because I am wondering about how much can parsing performance vary on x86 and x64 ;) – codersofthedark Dec 01 '11 at 14:28

2 Answers2

10

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Comparison

Acorn
  • 49,061
  • 27
  • 133
  • 172
1

lxml was written on C. And if you use x86 it is best chose. If we speak about techniques there is no big difference between Xpath and DOM - it's very quickly methods. But if you will use find or findAll in BeautifulSoup it will be slow than other. BeautifulSoup was written on Python. This lib needs a lot of memory for parse any data and, of course, it use standard search methods from python libs.

SkyFox
  • 1,805
  • 4
  • 22
  • 33
  • Well said, C written lib is always lot faster than pure Python module. Thanks for the update that lxml is written in C. Wanted to know why did u mention x86. Is it like in x64 something can perform better than lxml, if yes then which one and why? – codersofthedark Dec 01 '11 at 14:26
  • 2
    x86 or x64 in this context don't have any difference. I mean other platforms, like SPARC or ARM :) – SkyFox Dec 01 '11 at 14:27
  • aaw okies, that wont be a problem in my case :) – codersofthedark Dec 01 '11 at 14:30