Short version:
with concurrent.futures.ThreadPoolExecutor(max_workers=12) as executor:
for index, d in enumerate(found):
executor.submit(download, found[d], d, index)
That's it; a trivial change, and two lines less than your existing code, and you're done.
So, what's wrong with your existing code? Starting 1000 threads at a time is always a bad idea.* Once you get beyond a few dozen, you're adding more scheduler and context-switching overhead than you are concurrency savings.
If you want to know why it fails right around 1000, that could be because of a library working around older versions of Windows,**, or it could be because you're running out of stack space,***. But either way, it doesn't really matter. The right solution is to not use so many threads.
The usual solution is to use a thread pool—start about 8-12 threads,**** and have them pull the URLs to download off a queue. You can build this yourself, or you can use the concurrent.futures.ThreadPoolExecutor
or multiprocessing.dummy.Pool
that come with the stdlib. If you look at the main ThreadPoolExecutor
Example in the docs, it's doing almost exactly what you want. In fact, what you want is even simpler, because you don't care about the results.
As a side note, you've got another serious problem in your code. If you daemonize your threads, you're not allowed to join
them. Also, you're only trying to join the last one you created, which is by no means guaranteed to be the last one to finish. Also, daemonizing download threads is probably a bad idea in the first place, because when your main thread finishes (after waiting for one arbitrarily-chosen download to finish) the others may get interrupted and leave partial files behind.
Also, if you do want to daemonize a thread, the best way is to pass daemon=True
to the constructor. If you need to do it after creation, just do t.daemon = True
. Only call the deprecated setDaemon
function if you need backward compatibility to Python 2.5.
* I guess I shouldn't say always, because in 2025 it'll probably be an everyday thing to do, to take advantage of your thousands of slow cores. But in 2014 on normal laptop/desktop/server hardware, it's always bad.
** Older versions of Windows (at least NT 4) had all kinds of bizarre errors when you got close to 1024 threads, so many threading libraries just refuse to create more than 1000 threads. Although that doesn't seem to be the case here, as Python is just calling Microsoft's own wrapper function _beginthreadex
, which doesn't do that.
*** By default, each thread gets 1MB of stack space. And in 32-bit apps, there's a maximum total stack space, which I'd assume defaults to 1GB on your version of Windows. You can customize both the stack space for each thread, or the total process stack space, but Python doesn't customize either, nor do almost any other apps.
**** Unless your downloads are all coming off the same server, in which case you probably want at most 4, and really more than 2 is usually considered impolite if it's not your server. And why 8-12 anyway? It was a rule of thumb that tested well a long time ago. It's probably not optimal anymore, but it's probably close enough for most uses. If you really need to squeeze out a bit more performance, you can test with different numbers.