I'm trying to parallelize some calculation but I do not understand why one of my version (which I thought should have been faster) is slower than.
To be short, I have a list of userIds (more or less 200) and a list of placesId (more or less 2000 thousand). I need to calculate a score for EACH pair user/place. The good things is that the calculations are completely independent of each other and (depending of how we implement the the algorithm, don't even need a result in return).
I have tried 2 approach for this.
First approach
- pull ALL the places and ALL the users in the main thread
loop through all the user and spawn x thread (in my case on my little macbook 8 seems to be the best)
with cf.ThreadPoolExecutor(max_workers=8) as executor: futures = [executor.submit(task,userId, placeIds) for userId in userIds]
when all the futures are completed I loop through all of them and insert the result in the database (the worker task return a list [userId, placeId, score])
I have a task that will loop through ALL the places and return a result
def task(userId, placeIds): connection = pool.getconn() cursor = conn.cursor() #loop through all the places and call makeCalculation(cur, userId, placeId) pool.putconn(conn) return results
This lady and gentle man makes the all set of user/place to be calculated in 10 minutes (instead of 1.30hour by the way in sequential :))
But then I though.. why not ALSO paralleliz the score calculation ? So instead of a task having to loop through all the 2000 places one by once, spawn the calculations on other 8 thread for example.
Second Approach:
Basically this approach is replacing the loop in the "task" function by:
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
futures = [ executor.submit(calculateScores,userId,placeId) for placeId in placeIds]
The other modification I had to do is in the calculateScores function
def calculateScores(userId,placeId):
**connection = pool.getconn()
cursor = connecton.cursor()**
...
make a bunch of calculation by calling the database 1 or 2 times
pool.putconn(conn)
return [userId, placeId, score]
So as you can see because now calculateScores itself will be on 8 // threads so I cannot share a database connection otherwise I will get race conditions errors (and then script will crash 1 out 3 out of 4 times)
This approach, I thought was going to be faster bu takes 25 minutes..... (instead of 10 with the simple for loop...)
I'm 90% sure this is slower because EVERY tasks now get a database connection from the pool and this is somehow very expensive thus the slowness..
Could someone give me advices on whats the best way to make the most of parallelisation for my scenario?
Is this a good idea to make task returns result? or should I just insert them in the database as soon as they are ready in the calculateScores function ?
Is it good practice to have a Threadpool inside a ThreadPool ?
Should I try to put some multi-process in action ?
thank you!