I have a situation in multiprocessing where the list I use to collect the results from my function is not getting updated by the process. I have two examples of code, one which updates the list correction: the code updated properly using 'Thread', but fails when using 'Process', and one which does not. I cannot detect any kind of error. I think this might be a subtlety of scope that I don't understand.
Here is the working example: correction: this example does not work either; works with threading.Thread
, however.
def run_knn_result_wrapper(dataset,k_value,metric,results_list,index):
results_list[index] = knn_result(dataset,k_value,metric)
results = [None] * (k_upper-k_lower)
threads = [None] * (k_upper-k_lower)
joined = [0] * (k_upper-k_lower)
for i in range(len(threads)):
threads[i] = Process(target=run_knn_result_wrapper,args=(dataset,k_lower+i,metric,results,i))
threads[i].start()
if batch_size == 1:
threads[i].join()
joined[i]=1
else:
if i % batch_size == batch_size-1 and i > 0:
for j in range(max(0,i - 2),i):
if joined[j] == 0:
threads[j].join()
joined[j] = 1
for i in range(len(threads)):
if joined[i] == 0:
threads[i].join()
Ignoring the "threads" variable name (this started on threading, but then I found out about the GIL), the `results` list updates perfectly.
Here is the code which does not update the results list:
def prediction_on_batch_wrapper(batchX,results_list,index):
results_list[index] = prediction_on_batch(batchX)
batches_of_X = np.array_split(X,10)
overall_predicted_classes_list = []
for i in range(len(batches_of_X)):
batches_of_X_subsets = np.array_split(batches_of_X[i],10)
processes = [None]*len(batches_of_X_subsets)
results_list = [None]*len(batches_of_X_subsets)
for j in range(len(batches_of_X_subsets)):
processes[j] = Process(target=prediction_on_batch_wrapper,args=(batches_of_X_subsets[j],results_list,j))
for j in processes:
j.start()
for j in processes:
j.join()
if len(results_list) > 1:
results_array = np.concatenate(tuple(results_list))
else:
results_array = results_list[0]
I cannot tell why, within Python's scope rules the results_list
list does not get updated by the prediction_on_batch_wrapper
function.
A debugging session reveals that the results_list
value inside the prediction_on_batch_wrapper
function does, in fact, get updated...but somehow, it's scope is local on this second python file, and global on the first...
What is going on here?