0

Here is what my code looks like:

def data_processing_function(some_data):
    do some things with some_data
    queue.put(some_data)

processes = []
queue = Queue()

for data in bigdata:
    if data meets certain criteria:
        prepared_data = prepare_data(data)
        processes += [Process(target=data_processing_function,
                              args=prepared_data)]
        processes[-1].start()

for process in processes:
    process.join()

results = []
for i in range(queue.qsize()):
    result += [queue.get()]

When I tried this with a reduced dataset, everything went smoothly. But when I launched it with the full dataset, it looks like the script entered an infinite loop during the process.join() part. In a desperate moved, I killed all the processes except the main one, and the execution went on. Now it hangs on the queue.get() without notable CPU or RAM activity.

What can cause this? Is my code properly designed?

nicoco
  • 1,421
  • 9
  • 30
  • Maybe the code you didn't show is the answer. – romain-aga May 28 '16 at 13:07
  • Does this mean this part is OK? I'm trying again right now with 1 queue by process, I hope this will help... – nicoco May 28 '16 at 13:10
  • I didn't try, but at first look, it seems ok to me – romain-aga May 28 '16 at 13:14
  • It definitely looks like it's getting stuck (or maybe just taking a long time) inside `do some things with some_data`. Add some logging or print statements to make sure it's moving. Also what you're doing is very standard and can be reduced to a call to [`multiprocessing.Pool.map`](https://docs.python.org/2/library/multiprocessing.html). – Alex Hall May 28 '16 at 13:48
  • My problem was previously discussed here I think: http://stackoverflow.com/questions/26738648/script-using-multiprocessing-module-does-not-terminate . Unfortunately I don't understand the answer. If I empty the queue before joining, won't I miss some results from the latest processes I launched? – nicoco May 28 '16 at 14:24
  • See the subsection titled **Joining processes that use queues** in the multiprocessing [Programming guidelines](https://docs.python.org/2/library/multiprocessing.html#programming-guidelines) documentation. There, the example which will deadlock looks a lot like what you're doing. I think that is what the answer to other question is talking about. – martineau May 28 '16 at 15:13

0 Answers0