Speed up image labelling

Question

I have a stack of 1000 images with more than 120k objects. After running a Connected Component Algorithm in a third party software, I obtained a coloured image, in which each connected object has a label, a color and an ID value assigned.

I need to extract the list of the voxels coordinates that make up each of the objects.

I have written a Python code to extract this information but it is extremely slow since it processes ~30 labels/min. Therefore, it would take more than 2 days to process the entire scan in the best scenario.

After converting the stack into a 3D Numpy array (img_stack_numpy), here is the main part of the code:

# Store the labels and its associated voxels in a dictionary
labels, counts = np.unique(img_stack_np, return_counts=True)
labels_list = list(labels)
dict_labels_and_voxels = {}

for label in labels_list:
    
    if label==0:  # We don't want to store the label 0 (background)
        continue
    
    index = np.where(img_stack_np == label)
    dict_labels_and_voxels[label] = [index]

How could I improve the speed of it?

What is the type of (the items in) `img_stack_np`? Do you really need a `dict` type for dict_labels_and_voxels? What is the typical size of `img_stack_np` and `labels_list`? — Jérôme Richard, Apr 15 '21 at 19:37
img_stack_up is a 700x700x1000 numpy array (float32) and label_list contains 120K. The idea is store the voxels coordinates for each label in a dictionary where the key is the label itself. — pedro_galher, Apr 15 '21 at 19:46
Ok thank you. Why do you use float32 and not integers (like int8, int16, int32 or even int64) for the image? It means the labels will be float32 too and thus the key of the dict too. This is really unsafe due to the nature of float32 -based arithmetic (ie. precision issues). Not to mention it strongly impacts computation time & memory usage. — Jérôme Richard, Apr 15 '21 at 19:59
Hi thanks for your answer. I have just tried with integers and sadly there is no change in speed :( Maybe I have to accept that this process is slow due to the huge number of labels and voxels in the stack. — pedro_galher, Apr 15 '21 at 20:09
Please be more precise about the exact dimensionality of everything. You say *"images"* implying 2-D but talk about *"voxels"* implying 3-D. — Mark Setchell, Apr 15 '21 at 21:01
Exactly like that. I have 1000 2D images, that if stack together form a 3D image, like the ones generated in the CT. — pedro_galher, Apr 15 '21 at 21:03

score 3 · Accepted Answer · answered Apr 15 '21 at 23:17

The main issue with the current code lies in the line np.where(img_stack_np == label). Indeed, it iterate over the 700 * 700 * 1000 = 490,000,000 values of img_stack_np for the 120,000 values of label_list resulting in 490,000,000 * 120,000 = 58,800,000,000,000 values to check.

You do not need to iterate over img_stack_np for every label. What you need is to classify the "voxels" by their value (ie. label). You can do that using a custom sort:

first, store the position of each voxel in an array with the label;
then, do a key-value sort where the label is the key and the voxel position is the value;
then, iterate through the sorted items to group them by label (or use the less efficient np.unique for sake of simplicity);
finally store the position of each group in the final dict.

For sake of simplicity and to limit the memory usage, an index-based sort can also be used in replacement to the key-value sort. This can be done with argsort. Here is an example code:

s1, s2, s3 = img_stack_np.shape

# Classification by label.
# Remove the "kind='stable'" argument if you do not care about the ordering 
# of the voxel positions for a given label in the resulting dict (much faster).
index = np.argsort(img_stack_np, axis=None, kind='stable')
labels = img_stack_np.reshape(img_stack_np.size)[index]

# Generate the associated position
i1 = np.arange(s1).repeat(s2*s3)[index]
i2 = np.tile(np.arange(s2), s1).repeat(s3)[index]
i3 = np.tile(np.arange(s3), s1*s2)[index]

groupLabels, groupSizes = np.unique(img_stack_np, return_counts=True)
groupOffsets = np.concatenate(([0], counts.cumsum()))

dict_labels_and_voxels = {}

for i,label in enumerate(groupLabels):
    if label==0:
        continue
    start, end = groupOffsets[i], groupOffsets[i] + groupSizes[i]
    index = (i1[start:end], i2[start:end], i3[start:end])
    dict_labels_and_voxels[label] = [index]

Here are the results on my machine using a 100x100x1000 integer-based random input with 12000 labels:

Reference algorithm:                   363.29 s
Proposed algorithm (strict ordering):    3.91 s
Proposed algorithm (relaxed ordering):   2.02 s

Thus, the proposed implementation is 180 times faster in this case.
It should be several thousand times faster for your specified input.

Additional notes:

Using float32 values in img_stack_np takes a lot of memory: 700 * 700 * 1000 * 4 ~= 2 GB. Use float16 or even in8/int16 types if you can. Storing the position of each voxel also takes a lot of memory: from 3 GB (with int16) up to 12 GB (with int64). Consider using a more compact data representation or a smaller dataset. For further improvement, look at this this post to replace the quite slow np.unique call.

Wow! I'm really looking forward to arrive home this afternoon to implement it. It is an impressively elegant solution! — pedro_galher, Apr 16 '21 at 07:05
I have just implemented the code. It took only 2 minutes to run! This is so amazing. Really you saved me so much time in my research. I wish I could upvote your answer more times. I have just a question: Could you explain what is the logic when extracting the i1, i2 and i3? What do you do here? My programming logic is quite poor (only 1,5yr programming) — pedro_galher, Apr 16 '21 at 12:23
Yeah, this part is a bit tricky. I generate the position of each point. `i1` is the least contiguous dimention and `i3` is the most contiguous one. The first generate values like `[0, 0, 0, ..., 0, 1, 1, 1, ..., 1, 2, 2, ...]` using `np.repeat`. The number of repeated element is the number of point in each contiguous plan (of size `s2*s3`) and there are `s1` plans. For the last, I generate values like `[0, 1, 2, ..., s3, 0, 1, 2, ..., s3, 0, 1 ...]` using `np.tile`. The code for `i2` is a mix of both `i1` and `i3`. Actually, this is quite similar to what `np.meshgrid` does. — Jérôme Richard, Apr 16 '21 at 23:44
For each array, I reorder the position regarding the result of the sort so each position match with the label once it has been sorted. It is a bit like reapplying the same sorting transformation made on the labels but on the position. Note that you may not need the position: you can just store the index and then find the position of a voxel from the index since the index is the position of the voxel in the flatten voxel matrix and you know its shape. — Jérôme Richard, Apr 16 '21 at 23:49
In the final loop, I just retrieve the group offset and the group size to locate the slice of voxel position associated to a given label (at this time, all the labels are sorted and the voxel positions are sorted based on their associated labels). — Jérôme Richard, Apr 16 '21 at 23:52

Speed up image labelling

1 Answers1

Linked