0

I want to send a number of HTTP requests concurrently. I am using Python's multiprocessing.dummy.Pool to do this. Here is the code that creates the thread pool

p = Pool(len(users))

len(users)) is simply the number of requests.

As you can see, I am creating a thread for each request. Is this a bad idea? Should I instead create a fraction of len(users)) threads?

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
JRG
  • 210
  • 1
  • 10
  • Are you seeing any problems with your current approach? – wwii Oct 25 '17 at 02:39
  • Not quiet a duplicate, I don't see any rationale for choosing the number of *threads/processes/connections*, but you might like it https://stackoverflow.com/q/2632520/2823755 – wwii Oct 25 '17 at 02:43

1 Answers1

1

I'd personally suggest sizing based on a multiple of multiprocessing.cpu_count(); this is the approach concurrent.futures.ThreadPoolExecutor goes with, using 5 * multiprocessing.cpu_count() on the theory that threaded work blocks a lot so you want more threads than cores; if you've got a huge internet pipe, a higher multiple might make sense. You can restrict it to min(len(users), 5 * multiprocess.cpu_count()) if you like (which avoids allocating too many threads when you don't have the tasks to saturate them).

You don't want to use a thread per task because there are limits on threads and open handles, which can occur at fairly low values on many systems, and trying to do everything at once can break that if you're talking about 10,000+ requests. Given that your internet connection likely can't benefit from parallelism beyond a certain point, you'd just be wasting resources with more threads.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • I think I will use a different approach from threads actually. In python there is a thing called asyncio. I think this is like the select function in linux? The idea is that you send non-blocking requests, and then wait for any of them to be done? – JRG Oct 26 '17 at 07:07
  • @JRG: Sort of. Python has a `select` (and in modern Python, `selectors`) module that's a direct interface, but yes, the async features can do similar things (though they tend to hijack the whole program design if you use them much). – ShadowRanger Oct 26 '17 at 10:41