2

Say I have a simple function, it connects to a database (or a queue), gets a url that hasn't been visited, and then fetches the HTML at the given URL.

Now this process is serial, i.e. it will only fetch the html from a given url one at a time, how can I make this faster by doing this in a group of threads?

Blankman
  • 259,732
  • 324
  • 769
  • 1,199
  • 1
    Kind of a dup of this: http://stackoverflow.com/questions/1491993/download-multiple-pages-concurrently – Gerrat Feb 10 '11 at 20:11

2 Answers2

2

Yes. Many of the Python threading examples are just about this idea, since it's a good use for the threads.

Just to pick the top four Goggle hits on "python threads url": 1, 2, 3, 4.

Basically, things that are I/O limited are good candidates for threading speed-ups in Python; things the are processing limited usually need a different tool (such as multiprocessing).

tom10
  • 67,082
  • 10
  • 127
  • 137
0

You can do this using any of:

  • the thread module (if your task is a function)
  • the threading module (if you want to write your task as a subclass of threading.Thread)
  • the multiprocessing module (which uses a similar interface to threading)

All of these are available in the Python standard library (2.6 and later), and you can get the multiprocessing module for earlier versions as well (it just wasn't packaged with Python yet).

Kevin Horn
  • 4,167
  • 28
  • 30