The code below is extracted from a much longer script. The sequential version (without multiprocessing) works fine. However, when I use Pool
, the script gets stuck in a specific line.
I'd like to apply the same function crop_image
in parallel to some medical imaging volumes of a group of subjects which are retrieved from the lists all_subdirs
and all_files
. The function loads from path the subject volume with nib
and then extracts two 3D patches from it: the first patch has shape 40x40x40 and the second one has shape 80x80x80. Both patches have the same center.
In the simplified example, I only load two subjects. Both processes start cause the print
inside the function indeed returns:
>>> sub-001_ses-20101210_brain.nii.gz
>>> sub-002_ses-20110815_brain.nii.gz
However, the program then hangs indefinitely when having to perform tf.image.per_image_standardization
on the 80x80x80 patch. I'm suspecting it's a memory/space issue because if I set the big-scale patch also to 40x40x40 (or lower), the script runs without problems.
What could I try? Am I doing something wrong?
The following version actually works, but it's very simplified with respect to the actual one that doesn't:
import nibabel as nib
import numpy as np
import tensorflow as tf
def crop_image(subdir_path, file_path):
print(file_path)
small_scale = []
big_scale = []
nii_volume = nib.load(os.path.join(subdir_path, file_path)).get_fdata() # load volume with nibabel and extract np array
rows_range, columns_range, slices_range = nii_volume.shape # save volume dimensions
for y in range(20, rows_range, 40): # loop over rows
for x in range(20, columns_range, 40): # loop over columns
for z in range(20, slices_range, 40): # loop over slices
small_patch = nii_volume[y - 20:y + 20, x - 20:x + 20, z - 20:z + 20] # extract small patch
big_patch = nii_volume[y - 40:y + 40, x - 40:x + 40, z - 40:z + 40] # extract big patch
small_patch = tf.image.per_image_standardization(small_patch) # standardize small patch
small_scale.append(small_patch) # append small patch to external list
# HERE THE CODE GETS STUCK AND EVERYTHING BELOW IS NOT EXECUTED
big_patch = tf.image.per_image_standardization(big_patch) # standardize big patch
big_scale.append(big_patch) # append big patch to external list
# create tf.Dataset with lists (small_scale and big_scale)
# etc..
# etc..
final_results = 1 # invented number for the example
return final_results
if __name__ == '__main__':
all_subdirs = ['/home/newuser/Desktop/sub-001/ses-20101210/anat', '/home/newuser/Desktop/sub-002/ses-20110815/anat']
all_files = ['sub-001_ses-20101210_brain.nii.gz', 'sub-002_ses-20110815_brain.nii.gz']
# DEFINE pool of processes
num_workers = mp.cpu_count() # save number of available CPUs (threads)
pool = mp.Pool(processes=num_workers) # create pool object and set as many processes as there are CPUs
outputs = [pool.apply_async(crop_image, args=(path_pair[0], path_pair[1])) for path_pair in zip(all_subdirs, all_files)]
Thank you in advance!