Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?

Question

i am working on a project that requires me to create multiple threads to download a large remote file. I have done this already but i cannot understand while it takes a longer amount of time to download a the file with multiple threads compared to using just a single thread. I used my xampp localhost to carry out the time elapsed test. I would like to know if its a normal behaviour or is it because i have not tried downloading from a real server.

Thanks Kennedy

Provide some numbers, please. One download takes how long? Two concurrent downloads takes over twice as long? Three takes over three times as long? What are you talking about? Please provide the numbers you're seeing. Also, please provide the smallest code snippet of your multi-threaded code. There's a small possibility you're doing it wrong. — S.Lott, Nov 18 '10 at 20:47

score 4 · Accepted Answer · answered Nov 18 '10 at 20:46

4

9 women can't combine to make a baby in one month. If you have 10 threads, they each have only 10% the bandwidth of a single thread, and there is the additional overhead for context switching, etc.

answered Nov 18 '10 at 20:46

Karl Bielefeldt

47,314
10
60
94

score 1 · Answer 2 · answered Nov 18 '10 at 21:46

1

Python threading use something call the GIL (Golbal Interpreter Lock) that sometime degrade the programs execution time.

Without doing a lot of talk here i invite you to read this and this maybe it can help you to understand your problem, you can also see the two conference here and here.

Hope this can help :)

answered Nov 18 '10 at 21:46

mouad

67,571
18
114
106

2

GIL is freed while waiting for I/O so it's not the case for GIL weirdness. – andreypopp Nov 18 '10 at 22:42
@andreypopp: have you looked to the links in my answer ???, I/O bound process are also "affected"(intensionally) by he GIL in case when we use a multi-core process (which is the often case this days), i didn't want to get to explain all what i know about the GIL and I/O bound and CPU bound process because i taught that the video conference must be better than my poor knowledge and english so take a look to the links, they are talking about the OP case. – mouad Nov 18 '10 at 23:27
1

Sorry, I haven't time right now to look the video you mention, but I guess you're talking about both CPU bound and I/O bound threads in the same process — this can be an issue, but raw I/O bound threads works just good. Of course you cannot spawn thousands of threads like you can do in Erlang, but having 3-5 threads for concurrent download is common situation. – andreypopp Nov 19 '10 at 06:41

score 1 · Answer 3 · answered Nov 19 '10 at 06:02

Twisted uses non-blocking I/O, that means if data is not available on socket right now, doesn't block the entire thread, so you can handle many socket connections waiting for I/O in one thread simultaneous. But if doing something different than I/O (parsing large amounts of data) you still block the thread.

When you're using stdlib's socket module it does blocking I/O, that means when you're call socket.read and data is not available at the moment — it will block entire thread, so you need one thread per connection to handle concurrent download.

These are two approaches to concurrency:

Fork new thread for new connection (threading + socket from stdlib).
Multiplex I/O and handle may connections in one thread (Twisted).

Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?

3 Answers3

Linked