0

I have large 2D/3D NumPy arrays of binary values. 1s represent boundaries and 0s represent regions. Accompanied by a step size array that indicates the size of step in each dimension.

I am looking for an efficient program that can find (one of) the nearest boundary element to a given element. Distance is euclidean distance.

A 2D example:

import seaborn as sns
import numpy as np

step_size = np.array([5,5])  # size of step in each dimension
arr = np.array([[0,0,0,0,0,0,0,0,1,0],[1,0,0,0,0,0,0,1,0,0],[1,0,0,0,0,0,0,1,0,0],[0,1,0,0,0,0,1,0,0,0],[0,0,1,0,0,0,0,1,0,0],[0,0,1,1,1,1,1,0,0,0]])
sns.heatmap(arr, annot=True, cbar=False, linewidths=.3)

a = (0,2)  # a given element index
b = (1,0)  # nearest boundary element index, which is to be found by the program

a_coor = np.multiply(np.array(a), step_size)  # array([0,10])
b_coor = np.multiply(np.array(b), step_size)  # array([5,0])

distance = np.linalg.norm(a_coor-b_coor)  # 11.18

enter image description here

Georgy
  • 12,464
  • 7
  • 65
  • 73
Reveille
  • 4,359
  • 3
  • 23
  • 46
  • 1
    For 2D arrays see also: [Find the nearest nonzero element and corresponding index in a 2d NumPy array](https://stackoverflow.com/q/43306291/7851470). – Georgy Apr 29 '20 at 17:44

2 Answers2

1

You could find the locations of all 1s, obtain the euclidean distances to the given location, and return the smallest using argpartition on the result:

def nearest_boundary(x, loc):
    ones = np.c_[np.where(arr == 1)]
    dist = ((ones - loc)**2).sum(1)
    return ones[dist.argpartition(0)[0]]

Some examples:

nearest_boundary(arr, (0,2))
# array([1, 0], dtype=int64)

nearest_boundary(arr, (2,4))
# array([3, 6], dtype=int64)

With a 3D array:

np.random.seed(2)
arr = np.random.choice([0,1], p=[0.8,0.2], size=(3,5,4))

array([[[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [1, 0, 1, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[1, 0, 1, 0],
        [0, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 1]]])

nearest_boundary(arr, (0,3,0))
# array([0, 4, 0], dtype=int64)
yatu
  • 86,083
  • 12
  • 84
  • 139
  • 1
    Thanks! I was actually thinking of a method that would first find the surrounding boundary and does the distance calculations for those only. (like, if the regions are countries, it does not calculate distance to all the boundaries in the world). But I check if this is efficient enough for my cases. – Reveille Apr 29 '20 at 17:41
1

Definitely a job for a nearest neighbour search:

import seaborn as sns
import numpy as np
from sklearn.neighbors import KDTree
from matplotlib import pyplot as plt

step_size = np.array([5,5])  # size of step in each dimension
arr = np.array([[0,0,0,0,0,0,0,0,1,0],[1,0,0,0,0,0,0,1,0,0],[1,0,0,0,0,0,0,1,0,0],[0,1,0,0,0,0,1,0,0,0],[0,0,1,0,0,0,0,1,0,0],[0,0,1,1,1,1,1,0,0,0]])
sns.heatmap(arr, annot=True, cbar=False, linewidths=.3)

# get boundary pts (+0.5 to center for plot)
boundary = np.column_stack(np.nonzero(arr)) + 0.5

# create tree
tree = KDTree(boundary)

# get zero points to test for
zeros = np.column_stack(np.nonzero(~arr)) + 0.5

# get nearest neighbour boundary point for each zero
distance, index = tree.query(zeros)

# plot the results
for i, pt in enumerate(zeros):
    plt.gca().plot([pt[1], boundary[index[i,0]][1]], [pt[0], boundary[index[i,0]][0]], 'r')

The KDTree can calculate k neighbours easily and gives you back both the Euclidean distance and the index of the result in the original tree. A really useful tool. See the plot below:

First nearest neighbour

Also the result for second nearest neighbours, pass k=2 to query and plot with:

# plot the results
colors = ['r', 'b']
for i, pt in enumerate(zeros):
    for k in range(index.shape[1]):
        plt.gca().plot([pt[1], boundary[index[i,k]][1]], [pt[0], boundary[index[i,k]][0]], colors[k])

second nearest neighbour

Paddy Harrison
  • 1,808
  • 1
  • 8
  • 21
  • Thanks! Really impressive. How can the `step_size` also be included? Here for simplicity I set them all to 5, but they are not equal necessarily. – Reveille Apr 29 '20 at 20:58
  • The `KDTree` just needs an array of points, they don't have to be integers so you could include that there. Alternatively, to fit with the question here, the Cartesian distance between neighbours is given by `delta = boundary[i] - boundary[index[i,0]]` so you could easily add any step size here, even uneven step sizes, eg. `d = delta * (6,4)` where `(6,4)` is your step size. I would then use `np.linalg.norm(d)` to get your Euclidean distance. I hope that helps! – Paddy Harrison Apr 29 '20 at 21:47
  • Thanks you. There is this other related [question](https://stackoverflow.com/questions/61022427/find-nearest-transition-in-n-dimensional-array) of mine that now I think could be solved using a similar approach. At least for 2D/3D versions of it, which would be still fine. – Reveille May 02 '20 at 15:04
  • 1
    @Reveille I just had a look at your linked question and I think a nearest neighbour search would work well in that case too. – Paddy Harrison May 03 '20 at 11:10