Python numpy array -- close smallest regions

Question

I have a 2D boolean numpy array that represents an image, on which I call skimage.measure.label to label each segmented region, giving me a 2D array of int [0,500]; each value in this array represents the region label for that pixel. I would like to now remove the smallest regions. For example, if my input array is shape (n, n), I would like all labeled regions of < m pixels to be subsumed into the larger surrounding regions. For example if n=10 and m=5, my input could be,

0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 7, 8, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 6, 6, 4, 2, 2, 2, 3, 3, 3
4, 6, 6, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5

and the output is then,

0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1  # 7 and 8 are replaced by 0
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 3, 3, 3  # 6 is gone, but 3 remains
4, 4, 4, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5

I've looked into skimage morphology operations, including binary closing, but none seem to work well for my use case. Any suggestions?

@JonasAdler assuming here `m=1`, tie-breaks like this don't matter. Morphology operations that run left-right would likely yield `0 0 0 2 2`, but `0 0 2 2 2` is fine as well. — BoltzmannBrain, Sep 04 '17 at 20:23
Related question: https://stackoverflow.com/questions/46126409/numpy-filter-to-smooth-out-zero-regions @JonasAdler please take a look if you can, thanks! — BoltzmannBrain, Sep 09 '17 at 01:40

score 1 · Accepted Answer · answered Sep 04 '17 at 21:04

You can do this by performing a binary dilation on the boolean region corresponding to each label. By doing this you will find the number of neighbours for each region. Using this you can then replace values as needed.

For an example code:

import numpy as np
import scipy.ndimage

m = 5

arr = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 7, 8, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
       [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
       [4, 6, 6, 4, 2, 2, 2, 3, 3, 3],
       [4, 6, 6, 4, 5, 5, 5, 3, 3, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
       [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]
arr = np.array(arr)
nval = np.max(arr) + 1

# Compute number of occurances of each number
counts, _ = np.histogram(arr, bins=range(nval + 1))

# Compute the set of neighbours for each number via binary dilation
c = np.array([scipy.ndimage.morphology.binary_dilation(arr == i)
              for i in range(nval)])

# Loop over the set of arrays with bad count and update them to the most common
# neighbour
for i in filter(lambda i: counts[i] < m, range(nval)):
    arr[arr == i] = np.argmax(np.sum(c[:, arr == i], axis=1))

Which gives the expected result:

>>> arr.tolist()
[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
 [0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
 [4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
 [4, 4, 4, 4, 2, 2, 2, 3, 3, 3],
 [4, 4, 4, 4, 5, 5, 5, 3, 3, 5],
 [4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
 [4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]

This appears to work like a charm, thanks! At first glance the only changes I would make would be to use iterators where applicable -- `xrange` and `itertools.ifilter`. — BoltzmannBrain, Sep 04 '17 at 21:27
Also, in practice w/ larger arrays I'm finding the dilation operation works better when running until convergence w/ arg `iterations=-1`. — BoltzmannBrain, Sep 04 '17 at 22:11

Python numpy array -- close smallest regions

1 Answers1