Python, help to parallelize an algorithm (trying to have thread pool inside a thread pool

Question

I'm trying to parallelize some calculation but I do not understand why one of my version (which I thought should have been faster) is slower than.

To be short, I have a list of userIds (more or less 200) and a list of placesId (more or less 2000 thousand). I need to calculate a score for EACH pair user/place. The good things is that the calculations are completely independent of each other and (depending of how we implement the the algorithm, don't even need a result in return).

I have tried 2 approach for this.

First approach

pull ALL the places and ALL the users in the main thread
loop through all the user and spawn x thread (in my case on my little macbook 8 seems to be the best)
```
with cf.ThreadPoolExecutor(max_workers=8) as executor:
    futures = [executor.submit(task,userId, placeIds) for userId in userIds]
```
when all the futures are completed I loop through all of them and insert the result in the database (the worker task return a list [userId, placeId, score])

I have a task that will loop through ALL the places and return a result

def task(userId, placeIds):
    connection = pool.getconn()
    cursor = conn.cursor()
    #loop through all the places and call makeCalculation(cur, userId, placeId)
    pool.putconn(conn)
    return results

This lady and gentle man makes the all set of user/place to be calculated in 10 minutes (instead of 1.30hour by the way in sequential :))

But then I though.. why not ALSO paralleliz the score calculation ? So instead of a task having to loop through all the 2000 places one by once, spawn the calculations on other 8 thread for example.

Second Approach:

Basically this approach is replacing the loop in the "task" function by:

with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
   futures = [ executor.submit(calculateScores,userId,placeId) for placeId in placeIds]

The other modification I had to do is in the calculateScores function

def calculateScores(userId,placeId):
   **connection = pool.getconn()
   cursor = connecton.cursor()**
   ...
    make a bunch of calculation by calling the database 1 or 2 times

   pool.putconn(conn)
   return [userId, placeId, score]

So as you can see because now calculateScores itself will be on 8 // threads so I cannot share a database connection otherwise I will get race conditions errors (and then script will crash 1 out 3 out of 4 times)

This approach, I thought was going to be faster bu takes 25 minutes..... (instead of 10 with the simple for loop...)

I'm 90% sure this is slower because EVERY tasks now get a database connection from the pool and this is somehow very expensive thus the slowness..

Could someone give me advices on whats the best way to make the most of parallelisation for my scenario?

Is this a good idea to make task returns result? or should I just insert them in the database as soon as they are ready in the calculateScores function ?

Is it good practice to have a Threadpool inside a ThreadPool ?

Should I try to put some multi-process in action ?

thank you!

Are you aware of the GIL? The best explanation I know of is by David Beazley: http://www.dabeaz.com/python/UnderstandingGIL.pdf — cdarke, Jan 08 '15 at 10:39
I thought I did, but apparently not. Well the first example he gives is spot on why my second approach is slower... And obviously there is no work around this ? even if I start using processes ? — Johny19, Jan 08 '15 at 10:49
Good to know, I had this task done by Java and I thought that using python was going to be faster (well it is (20 minutes with java version)) But i thought I could have pushed it even more with approach 2 — Johny19, Jan 08 '15 at 10:52
@AlexGidan: a bad workman blames his tools. There are plenty of jobs where Python + threads are useful together. — jfs, Jan 08 '15 at 20:17
@J.F.Sebastian I am not blaming a tool here. A good workman should choose the right tool. Python historically does not fit well with multi-threading (even if enormous efforts are being made to overcome this limitation), that's a fact. So, IMHO, better to choose another tool in case of heavy parallelism needs. — Alex Gidan, Jan 13 '15 at 14:00
@AlexGidan: Your claim: python + threads are **always** bad. My claim: there are cases when they are a good fit. It is not clear from the question whether threads can improve performance here but it is not clear the opposite too: that is threads can improve performance (in principle) — jfs, Jan 13 '15 at 14:26

score 1 · Answer 1 · edited May 23 '17 at 11:45

Is it good practice to have a Threadpool inside a ThreadPool ?

No, a single thread pool is enough in your case e.g.:

from concurrent.futures import ThreadPoolExecutor as Executor
from collections import deque

with Executor(max_workers=8) as executor:
    deque(executor.map(calculateScores, userIds, placeIds), maxlen=0)

If the database is the bottleneck in your application (to find out, you could mock the db calls) i.e., if the task is I/O bound then threads can improve time performance (to a point) because GIL can be released during I/O (and other blocking OS) calls by python itself or in a C extension such as a db driver for CPython.

If the database handles the concurrent access well then each thread could use its own db connection. Note: 8 threads can be faster than both 4 and 16 threads -- you need to measure it.

The time performance may depend greatly on how you structure db operations. See Improve INSERT-per-second performance of SQLite?

If the task is CPU-bound e.g., you perform some expensive pure Python calculations for each user/place id then you could try ProcessPoolExecutor instead of ThreadPoolExecutor. Make sure that the copying of input/output data between processes does not dominate the computations themselves.

Python, help to parallelize an algorithm (trying to have thread pool inside a thread pool

1 Answers1