Performant way to fill holes for categorical data?

Question

We have 3D segmentation masks where every class has its own label / ID. For every class we would like to fill holes in the segmentation.

For an example, the following matrix:

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 0, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 0, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

Should result in

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

The only filled holes are the 1 and 3 in the middle slice. The 2 shape is open to the side and the 4 is open to the back. The 0 between the classes should stay untouched.

I implemented 7 versions using the existing scipy.ndimage.morphology.binary_fill_holes function (or its implementation) and numpy. Here the two best versions so far:

import numpy as np
from scipy.ndimage.morphology import binary_fill_holes, label, generate_binary_structure, binary_dilation

def fill_holes6(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros_like(img)
    for i in applied_labels:
        output[binary_fill_holes(img == i)] = i

    return output

def fill_holes7(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros(img.shape, dtype=int)
    for i in applied_labels:
        tmp = np.zeros(img.shape, dtype=bool)
        binary_dilation(tmp, structure=None, iterations=-1, mask=img != i, origin=0, border_value=1, output=tmp)
        output[np.logical_not(tmp)] = i
        
    return output

# EDIT: Added the following method:
def fill_holes8(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    connectivity = 1
    footprint = generate_binary_structure(img.ndim, connectivity)
    background_mask = img == 0
    components, num_components = label(background_mask, structure=footprint)
    filled_holes = np.zeros_like(img)
    for component_label in range(1, num_components + 1):
        component_mask = components == component_label
        component_neighborhood = np.pad(img, 1, constant_values=-1)[binary_dilation(np.pad(component_mask, 1), structure=footprint)]

        neighbor_labels = np.unique(component_neighborhood)
        if len(neighbor_labels) == 2 and -1 not in neighbor_labels:
            neighbor_label = neighbor_labels[1]
            filled_holes[component_mask] = neighbor_label

    return img + filled_holes

I measured the performance the following way (matching my real world data distribution):

import time
import pandas as pd

def measure(funs, t):
    res = []
    for _ in range(t):
        ra = np.random.randint(10, 40)
        sh = np.random.randint(200, 400, 3)
        img = np.random.randint(0, ra, sh)

        applied_labels = np.unique(img)[1:]

        fun_res = []
        for fun in funs:
            start = time.time()
            fun(img, applied_labels)
            end = time.time()
            fun_res.append(end - start)
        res.append(fun_res)
    return np.min(res, axis=0), np.max(res, axis=0), np.mean(res, axis=0), np.std(res, axis=0)

print(measure([fill_holes6, fill_holes7], t=10))

For my first implementations I got the following execution times (t=100):

	fill_holes1	fill_holes2	fill_holes3
min	6.4s	6.9s	6.2s
max	83.7s	96.0s	80.4s
mean	32.9s	37.3s	31.6s
std	17.3s	20.1s	16.5

This is very slow. The last implementation fill_holes7 is only 1.27 times faster than fill_holes3.

Is there a more performant way of doing this?

Opened a feature request on the scipy project first but was asked to go to stackoverflow first: https://github.com/scipy/scipy/issues/14504

EDIT:

I also opened a feature request on the MONAI project. See #2678 For this I opened a pull request with the iterative erosion solution (fill_holes7). You can find the documentation here: monai.transforms.FillHoles

During this I also implemented a connected component labeling (CCL) based version. See the implementation in MONAI here. I added fill_holes8 above, which is basically that implementation.

The MONAI package is happy for any pull request improving the performance of this method. Feel free to go there, open an issue and a pull request.

@MarkSetchell With diagram you mean like an example visualizing the "fill holes" problem? I now added matrices to visualize that. — Spenhouet, Aug 01 '21 at 11:56
Can watersheding or closing functions help you here? Look at things like https://stackoverflow.com/questions/11294859/how-to-define-the-markers-for-watershed-in-opencv. Seems like it's the same problem. — tupui, Aug 01 '21 at 12:35
@tupui I don't see how these methods would solve my problem. They would also introduce quite a lot of error. I only want to fill in enclosed holes within the same segment / class. Everything else should not change. There is a 0% tolerance here. My example above does contain some edge cases to ensure that this works right. The binary_fill_holes method of scipy works correct but only on a single class. I therefore need to run it per class which makes it extremely expensive. — Spenhouet, Aug 01 '21 at 14:07
I made some micro optimizations and also used the internal implementation of `binary_fill_holes` to save some matrix creations / memory allocations. This brought another x1.27 time improvement but this is still way too slow. @tupui I have to say that I do not fully understand how the `binary_fill_holes` implementation works. It uses the `binary_dilation` but the dilation matrix is all 0 - so I'm unsure what is dilated here. It also makes use of the mask parameter to solve the problem but I don't get how this mask is applied. I can not go deeper until I understand this part. — Spenhouet, Aug 01 '21 at 16:02

score 2 · Answer 1 · answered Aug 08 '21 at 22:00

binary_fill_holes is not very efficiently implemented: it seems to not make use of SIMD instruction and is not parallelized. It is also based on a pretty intensive algorithm (iterative erosion). Since this function is run for each label, your final implementation is very computationally intensive. One solution to fix the performance issue is to redesign your algorithm.

A first step is to keep the iteration over the label and find a more efficient way to fill hole. One efficient solution is to use a flood-fill algorithm on each cell of the border not yet filled and then look for the unfilled cells. These remaining cells should be either holes or cell already set with the current label. Such an algorithm should be quite fast. However, it is not easy to implement it efficiently in Python. There are some implementations of flood-fill in Python (eg. in skimage.morphology), but the cost of calling the function from Python for most border cell would be too high.

An alternative solution is to use a label algorithm to find all the region of the array that are connected each other. This can easily be done using label of skimage.measure. Once labeled, the labeled border regions can be set as not being holes. The remaining one as either holes or regions already set with the right label. This solution is more intensive, especially when the number of label is big (which seems quite rare based on your example and as long as each labels are computed separately). Here is an implementation:

from skimage.measure import label

def getBorderLabels(img):
    # Detection
    lab0 = np.unique(img[:,:,0])
    lab1 = np.unique(img[:,:,-1])
    lab2 = np.unique(img[:,0,:])
    lab3 = np.unique(img[:,-1,:])
    lab4 = np.unique(img[0,:,:])
    lab5 = np.unique(img[-1,:,:])

    # Reduction
    lab0 = np.union1d(lab0, lab1)
    lab2 = np.union1d(lab2, lab3)
    lab4 = np.union1d(lab4, lab5)
    return np.union1d(np.union1d(lab0, lab2), lab4)

def getHoleLabels(borderLabels, labelCount):
    return np.setdiff1d(np.arange(1, labelCount+1, dtype=int), borderLabels, assume_unique=True)

def fill_holes8(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = img.copy()
    for i in applied_labels:
        labelized, labelCount = label(img==i, background=True, return_num=True, connectivity=1)
        holeLabels = getHoleLabels(getBorderLabels(labelized), labelCount)
        if len(holeLabels) > 0:
            output[np.isin(labelized, holeLabels)] = i
    return output

This implementation is about 3 times faster on my machine.

Note that it is possible to parallelize the algorithm (eg. using multiple processes) by working on multiple label at the same time. However, one should care to not use too much memory and not write in output in the correct order (similar to the sequential algorithm).

The biggest source of slow-down comes from the separate computation of each label. Once can tune the flood-fill algorithm to write a custom well-optimized implementation fitting you need although this appear to be pretty hard to do. Alternatively, one can tune the label-based implementation to do the same. The second approach is simpler, but not easy either. Many question arise in complex cases: what should happen when cells with a given label L1 form a boundary containing a hole itself containing other cells with a given label L2 forming a boundary containing another hole? What if the boundaries overlap partially each other? Are such cases possible? Should they be investigated, and if yes, what would be the set of accepted outputs?

As long as the labeled boundaries are not forming tricky cases, there is a quite efficient algorithm to track and fill holes with the right labels. I am not sure it always work but here is the idea:

Use a label algorithm to find all connected regions
Build a set of label containing all the labels
Remove the labels associated with border regions from the set
Remove the labels associated with cells already labelled (ie. non-zero cells)
So far, the remaining labels are either holes, fake-holes (or tricky cases assumed not present). Fake-holes are unlabelled cells surrounded by labelled cells with multiple different labels (like the 0 cells in the middle of your example).
Check the label of the cell on the boundaries of each labelled regions. If a labelled region is only surrounded by cells with the same label L3, then it is a hole that must be filled with L3-label cells. Otherwise, this is either a fake-hole (or a tricky case).

The resulting algorithm should be much faster than the reference implementation and the previous one.

I implemented a first version based on the iterative erosion in the MONAI package based on my feature request ([#2678](https://github.com/Project-MONAI/MONAI/issues/2678)) there. See the documentation [here](https://docs.monai.io/en/latest/transforms.html#fillholes). I also had an implementation with CCL (see [here](https://github.com/Spenhouet/MONAI/blob/201bef0d8a6215c26ab91bcdf2c802cdc4e89e24/monai/transforms/utils.py#L737)) but this was way slower. Your CCL implementation has some improvements over mine. The MONAI project is happy to get a PR for an improved version. — Spenhouet, Aug 09 '21 at 06:58
I tested your CCL implementation with real-world data against the iterative erosion implementation. For all labels (without filter) it was faster and took 26s compared to 29s. For a single label it was slower and took 1.6s compared to 0.8s. I did run it on a 6x3,5GHz (12 Threads) CPU with 32 GB RAM. So it looks to be comparable in terms of performance but not significantly enough. Would be cool to see the performance of your proposed implementation. — Spenhouet, Aug 09 '21 at 07:18

Performant way to fill holes for categorical data?

1 Answers1