Network multithread in python

Question

I'm writing a script in Python that will scrape some pages from my web server and put them in a file. I'm using mechanize.Browser() module for this particular task.

However, I've found that creating one single instance of mechanize.Browser() is rather slow. Is there a way I could relatively painlessly use multihreading/multiprocessing (i.e. issue several GET requests at once)?

Have you looked at the Python [threading](http://docs.python.org/library/threading.html) module? — ObscureRobot, Oct 20 '11 at 05:10
Related: http://stackoverflow.com/questions/4119680/multiple-asynchronous-connections-with-urllib2-or-other-http-library and http://stackoverflow.com/questions/4139988/multiple-urllib2-connections and http://stackoverflow.com/questions/6905800/multiprocessing-useless-with-urllib2 — amit kumar, Oct 20 '11 at 05:17
well, if you don't want to use threading as @ObscureRobot suggested, you can try [multiprocessing](http://docs.python.org/library/multiprocessing.html). — imm, Oct 20 '11 at 05:30
ObscureRobot and imm: I don't want CPU threads. As my post says, I want "[to] issue several GET requests at once" - as in HTTP GET request. @phaedrus - thanks, those are an interesting read. Doesn't seem to be very easy to implement, looks like I'd have to rewrite the entire app (over 3000 lines of code) — Bo Milanovich, Oct 20 '11 at 05:53
@deusdies, we cant know how to help you unless you give us enough context to isolate what is so hard about using multiprocessing with your code. Sample code illustrating the problem would make this an answerable question — Mike Pennington, Oct 20 '11 at 10:05
related: [Problem with multi threaded Python app and socket connections](http://stackoverflow.com/questions/4783735/problem-with-multi-threaded-python-app-and-socket-connections) — jfs, Oct 23 '11 at 18:08
have you tried [scrapy](http://doc.scrapy.org/en/latest/intro/overview.html) — jfs, Oct 23 '11 at 18:12

score 1 · Answer 1 · answered Oct 23 '11 at 15:24

1

Use gevent or eventlet to get concurrent network IO.

answered Oct 23 '11 at 15:24

cerberos

7,705
5
41
43

score 1 · Answer 2 · answered Oct 26 '11 at 11:57

1

If you want industrial strength Python web scraping, check out scrapy. It uses Twisted for async comms and is blindingly fast. Being able to spider through 50 pages per-second isn't an unrealistic expectation.

answered Oct 26 '11 at 11:57

synthesizerpatel

27,321
5
74
91

Network multithread in python

2 Answers2