Suppose I have the following multiprocessing structure:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() == True:
break
else:
picked = working_queue.get()
res_item = "Number " + str(picked)
output_queue.put(res_item)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
results_bank = []
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(2)]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() == True:
break
results_bank.append(output_q.get_nowait())
if len(results_bank) == len(static_input):
print "Good run"
else:
print "Bad run"
My question: How would I 'batch' write my results to a single file while the working_queue is still 'working' (or at least, not finished)?
Note: My actual data structure is not sensitive to unordered results relative to inputs (despite my example using integers).
Also, I think that batch/set writing from the output queue is best practice rather than from the growing results bank object. However, I am open to solutions relying on either approach. I am new to multiprocessing so unsure of best practice or most efficient solution(s) to this question.