I use multiprocessing in my python code to run asynchronously a function:
import multiprocessing
po = multiprocessing.Pool()
for elements in a_list:
results.append(po.apply_async(my_module.my_function, (some_arguments, elements, a_big_argument)))
po.close()
po.join()
for r in results:
a_new_list.add(r.get())
a_big_argument
is a dictionary. I give it as an argument. It is big in the sense that it is between 10 and 100 Mo. It seems like it has a big impact on the performance of my code.
I'm probably doing something stupid and not efficient here, since the performance of my code really went down with this new argument.
What is the best way to deal with a big dictionary? I don't want to load it every time in my function. Would it be a solution to create a database instead and to connect to it?
Here is a code you can run:
'''
Created on Mar 11, 2013
@author: Antonin
'''
import multiprocessing
import random
# generate an artificially big dictionary
def generateBigDict():
myBigDict = {}
for key in range (0,1000000):
myBigDict[key] = 1
return myBigDict
def myMainFunction():
# load the dictionary
myBigDict = generateBigDict()
# create a list on which we will asynchronously run the subfunction
myList = []
for list_element in range(0,20):
myList.append(random.randrange(0,1000000))
# an empty set to receive results
set_of_results = set()
# there is a for loop here on one of the arguments
for loop_element in range(0,150):
results = []
# asynchronoulsy run the subfunction
po = multiprocessing.Pool()
for list_element in myList:
results.append(po.apply_async(mySubFunction, (loop_element, list_element, myBigDict)))
po.close()
po.join()
for r in results:
set_of_results.add(r.get())
for element in set_of_results:
print element
def mySubFunction(loop_element, list_element, myBigDict):
import math
intermediaryResult = myBigDict[list_element]
finalResult = intermediaryResult + loop_element
return math.log(finalResult)
if __name__ == '__main__':
myMainFunction()