1

Suppose I have an (m x n) 2-d numpy array that are just 0's and 1's. I want to "smooth" the array by running, for example, a 3x3 kernel over the array and taking the majority value within that kernel. For values at the edges, I would just ignore the "missing" values.

For example, let's say the array looked like

import numpy as np

x = np.array([[1, 0, 0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0],
              [0, 0, 1, 1, 1, 1, 1, 0],
              [0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 1, 0, 1, 1, 1, 0],
              [0, 1, 1, 1, 1, 0, 1, 0],
              [0, 0, 1, 1, 1, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0]])

Starting at the top left "1", a 3 x 3 kernel centered at the first top left element, would be missing the first row and first column. The way I want to treat that is just ignore that and consider the remaining 2 x 2 matrix:

1 0
0 0

In this case, the majority value is 0, so set that element to 0. Repeating this for all elements, the resulting 2-d array I would want is:

0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 1 0
0 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0

How do I accomplish this?

Vincent
  • 7,808
  • 13
  • 49
  • 63
  • 2
    Have you seen [rank_filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.rank_filter.html) in scipy? It might not solve the problem. But it reminds me of that. – gnodab Apr 13 '20 at 21:59

2 Answers2

2

You can use skimage.filters.rank.majority to assign to each value the most occuring one within its neighborhood. The 3x3 kernel can be defined using skimage.morphology.square:

from skimage.filters.rank import majority
from skimage.morphology import square

majority(x.astype('uint8'), square(3))

array([[0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

Note: You'll need the latest stable version of scikit-image for majority. More here

yatu
  • 86,083
  • 12
  • 84
  • 139
  • Also, this seems to have the boundary behavior as I intended in the original question. Looks like at the boundaries, it ignores the "missing values", and then in the case of a tie, it takes the lower value? Tried looking at the docs: https://scikit-image.org/docs/dev/api/skimage.filters.rank.html?highlight=otsu%20filter#skimage.filters.rank.majority but doesn't seem like it obviously documents behaviors at the boundary or case of tie. – Vincent Apr 13 '20 at 23:37
  • No, there's not much in the docs. I think you'd have to look [here](https://github.com/scikit-image/scikit-image/blob/master/skimage/filters/rank/generic_cy.pyx). @vincent – yatu Apr 14 '20 at 07:16
0

I ended up doing something like this (which is based off of How do I use scipy.ndimage.filters.gereric_filter?):

import scipy.ndimage.filters
import scipy.stats as scs


def filter_most_common_element(a, w_k=np.ones(shape=(3, 3))):
    """
    Creating a function for scipy.ndimage.generic_filter.

    See https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.generic_filter.html for more information
    on generic filters. 

    This filter takes a kernel of np.ones() to find the most common element in the array.
    Based off of https://stackoverflow.com/questions/61197364/smoothing-a-2-d-numpy-array-with-a-kernel
    """
    a = a.reshape(w_k.shape)
    a = np.multiply(a, w_k)

    # See https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.mode.html
    most_common_element = scs.mode(a, axis=None)[0][0]
    return most_common_element
x = np.array([[1, 0, 0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0],
              [0, 0, 1, 1, 1, 1, 1, 0],
              [0, 0, 1, 1, 0, 1, 1, 0],
              [0, 0, 1, 0, 1, 1, 1, 0],
              [0, 1, 1, 1, 1, 0, 1, 0],
              [0, 0, 1, 1, 1, 1, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0]])

out = scipy.ndimage.filters.generic_filter(x, filter_most_common_element, footprint=np.ones((3,3)),mode='constant',cval=0.0)

out
array([[0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 0, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 1, 0],
       [0, 0, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])
Vincent
  • 7,808
  • 13
  • 49
  • 63
  • Ah, just realized this does not handle the boundaries in the way that I specified in the original post. What `mode='constant',cval=0.0` does, is that for the "missing values" at the boundaries, it assumes a constant value of 0 (instead of ignoring them completely) – Vincent Apr 13 '20 at 23:30