Here is what my code looks like:
def data_processing_function(some_data):
do some things with some_data
queue.put(some_data)
processes = []
queue = Queue()
for data in bigdata:
if data meets certain criteria:
prepared_data = prepare_data(data)
processes += [Process(target=data_processing_function,
args=prepared_data)]
processes[-1].start()
for process in processes:
process.join()
results = []
for i in range(queue.qsize()):
result += [queue.get()]
When I tried this with a reduced dataset, everything went smoothly. But when I launched it with the full dataset, it looks like the script entered an infinite loop during the process.join()
part.
In a desperate moved, I killed all the processes except the main one, and the execution went on. Now it hangs on the queue.get()
without notable CPU or RAM activity.
What can cause this? Is my code properly designed?