Let's say I have the following function:
def fetch_num():
x = np.random.randint(low=0, high=1000000) # choose a number
for i in range(5000000): # do some calculations
j = i ** 2
return x # return a result
This function picks a random number, then does some calculations, and returns it.
I would like to create a large list, containing all of these results. The catch is, that I don't want to process the same number twice, and I want to use multiprocessing
to make that quicker.
I've tried the following code:
import multiprocessing as mp
from tqdm import tqdm
from parallelizing_defs import fetch_num
import os
os.system("taskset -p 0xff %d" % os.getpid())
if __name__ == '__main__':
n = 10 # number of numbers that I want to gather
def collect_result(result): # a callback function - only append if it is not in the results list
if result not in results:
results.append(result)
pbar.update(1) # this is just for the fancy progress bar
def error_callback(e):
raise e
pool = mp.Pool(6) # create 6 workers
global results # initialize results list
results = []
pbar = tqdm(total=n) # initialize a progress bar
while len(results) < n: # work until enough results have been accumulated
pool.apply_async(fetch_num, args=(), callback=collect_result, error_callback=error_callback)
pool.close()
pool.join()
Notes:
- the function
fetch_num
is imported from a different python file since I understand that it doesn't work within the same file from this issue Multiprocessing example giving AttributeError - the weird
os
line, I added after reading this issue: Why does multiprocessing use only a single core after I import numpy?
My problem is:
- The loop doesn't stop, it goes on forever.
- The iterations are not faster, it doesn't seem to be using more than one core.
I've tried a bunch of other configurations, but it doesn't seem to work. This sounds like a very common situation but I haven't been able to find an example of that particular problem. Any ideas as to why these behaviours take place would be much appreciated.