Multiprocessing Pool gets stuck

Question

The code below is extracted from a much longer script. The sequential version (without multiprocessing) works fine. However, when I use Pool, the script gets stuck in a specific line.

I'd like to apply the same function crop_image in parallel to some medical imaging volumes of a group of subjects which are retrieved from the lists all_subdirs and all_files. The function loads from path the subject volume with nib and then extracts two 3D patches from it: the first patch has shape 40x40x40 and the second one has shape 80x80x80. Both patches have the same center.

In the simplified example, I only load two subjects. Both processes start cause the print inside the function indeed returns:

>>> sub-001_ses-20101210_brain.nii.gz
>>> sub-002_ses-20110815_brain.nii.gz

However, the program then hangs indefinitely when having to perform tf.image.per_image_standardization on the 80x80x80 patch. I'm suspecting it's a memory/space issue because if I set the big-scale patch also to 40x40x40 (or lower), the script runs without problems.

What could I try? Am I doing something wrong?

The following version actually works, but it's very simplified with respect to the actual one that doesn't:

import nibabel as nib
import numpy as np
import tensorflow as tf


def crop_image(subdir_path, file_path):
    print(file_path)
    small_scale = []
    big_scale = []

    nii_volume = nib.load(os.path.join(subdir_path, file_path)).get_fdata()  # load volume with nibabel and extract np array

    rows_range, columns_range, slices_range = nii_volume.shape  # save volume dimensions

    for y in range(20, rows_range, 40):  # loop over rows
        for x in range(20, columns_range, 40):  # loop over columns
            for z in range(20, slices_range, 40):  # loop over slices
                small_patch = nii_volume[y - 20:y + 20, x - 20:x + 20, z - 20:z + 20]  # extract small patch
                big_patch = nii_volume[y - 40:y + 40, x - 40:x + 40, z - 40:z + 40]  # extract big patch
                small_patch = tf.image.per_image_standardization(small_patch)  # standardize small patch
                small_scale.append(small_patch)  # append small patch to external list

                # HERE THE CODE GETS STUCK AND EVERYTHING BELOW IS NOT EXECUTED

                big_patch = tf.image.per_image_standardization(big_patch)  # standardize big patch
                big_scale.append(big_patch)  # append big patch to external list

    # create tf.Dataset with lists (small_scale and big_scale)
    # etc..
    # etc..

    final_results = 1  # invented number for the example

    return final_results

if __name__ == '__main__':
    all_subdirs = ['/home/newuser/Desktop/sub-001/ses-20101210/anat', '/home/newuser/Desktop/sub-002/ses-20110815/anat']
    all_files = ['sub-001_ses-20101210_brain.nii.gz', 'sub-002_ses-20110815_brain.nii.gz']

    # DEFINE pool of processes
    num_workers = mp.cpu_count()  # save number of available CPUs (threads)
    pool = mp.Pool(processes=num_workers)  # create pool object and set as many processes as there are CPUs
    outputs = [pool.apply_async(crop_image, args=(path_pair[0], path_pair[1])) for path_pair in zip(all_subdirs, all_files)]

Thank you in advance!

did you solve it? for me it was a memory issue, FYI: https://stackoverflow.com/questions/75784038/python-multiprocessing-application-is-getting-stuck-in-docker-container — ItayB, Mar 20 '23 at 17:57

Multiprocessing Pool gets stuck

0 Answers0