I'd personally suggest sizing based on a multiple of multiprocessing.cpu_count()
; this is the approach concurrent.futures.ThreadPoolExecutor
goes with, using 5 * multiprocessing.cpu_count()
on the theory that threaded work blocks a lot so you want more threads than cores; if you've got a huge internet pipe, a higher multiple might make sense. You can restrict it to min(len(users), 5 * multiprocess.cpu_count())
if you like (which avoids allocating too many threads when you don't have the tasks to saturate them).
You don't want to use a thread per task because there are limits on threads and open handles, which can occur at fairly low values on many systems, and trying to do everything at once can break that if you're talking about 10,000+ requests. Given that your internet connection likely can't benefit from parallelism beyond a certain point, you'd just be wasting resources with more threads.