Vectorized interpretation of distance matrix

Question

I have a few points and like to determine if they are in a specific distance to each other. If they are, I want to merge them into one point. I build a search tree and got a distance matrix from it. Is there an elegant (without slow loops if possible) way to determine which points are in specific distance without using some complex cluster algorithm (kmeans, hierarchical, etc.)?

import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.neighbors import radius_neighbors_graph

RADIUS = 0.025
points = np.array([
    [13.2043373032, 52.3818529896],
    [13.0530692845, 52.3991668707],
    [13.229309674, 52.3840231],
    [13.489018081, 52.4180538095],
    [13.3209738098, 52.6375963146],
    [13.0160362703, 52.4187139243],
    [13.0448485, 52.4143229343],
    [13.32478977, 52.5090253],
    [13.35514839, 52.5219323867],
    [13.1982523828, 52.3592620828]
])

tree = NearestNeighbors(n_neighbors=2, radius=RADIUS, leaf_size=30, algorithm="auto", n_jobs=1).fit(points)
nnGraph = radius_neighbors_graph(tree, RADIUS, mode='distance', include_self=False)

print nnGraph

(0, 9)        0.0233960536484
(1, 6)        0.0172420289306
(6, 1)        0.0172420289306
(9, 0)        0.0233960536484

score 1 · Answer 1 · answered Feb 02 '16 at 08:26

You can use pdist and squareform from scipy.spatial.distance for a vectorized solution, like so -

from scipy.spatial.distance import pdist, squareform

# Get pairwise euclidean distances              
dists = squareform(pdist(points))

# Get valid distances mask and the corresponding indices
mask = dists < RADIUS
np.fill_diagonal(mask,0)
idx = np.argwhere(mask)

# Present indices and corresponding distances as zipped output
out = zip(map(tuple,idx),dists[idx[:,0],idx[:,1]])

Sample run -

In [91]: RADIUS
Out[91]: 0.025

In [92]: points
Out[92]: 
array([[ 13.2043373 ,  52.38185299],
       [ 13.05306928,  52.39916687],
       [ 13.22930967,  52.3840231 ],
       [ 13.48901808,  52.41805381],
       [ 13.32097381,  52.63759631],
       [ 13.01603627,  52.41871392],
       [ 13.0448485 ,  52.41432293],
       [ 13.32478977,  52.5090253 ],
       [ 13.35514839,  52.52193239],
       [ 13.19825238,  52.35926208]])

In [93]: out
Out[93]: 
[((0, 9), 0.023396053648436933),
 ((1, 6), 0.017242028930573985),
 ((6, 1), 0.017242028930573985),
 ((9, 0), 0.023396053648436933)]

@user2638109 If this solved your problem, consider accepting it by clicking on the hollow checkmark next to the solution. More info - http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work — Divakar, Feb 02 '16 at 18:10

score 0 · Answer 2 · edited May 23 '17 at 11:45

For small point counts (<50) it's a bit faster to use complex numbers. Found this in another post: Efficiently Calculating a Euclidean Distance Matrix Using Numpy

pointsCmplx = np.array([points[...,0] + 1j * points[...,1]])
dists = abs(pointsCmplx.T - pointsCmplx)

My goal is to get non overlapping points in respect of the radius. I took your code and deleted the lower triangular matrix and at the end I simply deleted the second point. The points are sorted by a specific observation. Lower index means more important. Any other suggestions to efficiently merge near clusters instead of deleting points? Iam looking for a very fast solution and don't want to use some complex clustering algorithm.

# overlapping points
points = np.array([
    [13.2043373032, 52.3818529896],
    [13.0530692845, 52.3991668707],
    [13.229309674, 52.3840231],
    [13.489018081, 52.4180538095],
    [13.3209738098, 52.6375963146],
    [13.0160362703, 52.4187139243],
    [13.0448485, 52.4143229343],
    [13.32478977, 52.5090253],
    [13.35514839, 52.5219323867],
    [13.1982523828, 52.3592620828],
    [13.1982523828, 52.3592620830]       # nearly identical
])

dists = squareform(pdist(points))
mask = dists < RADIUS
np.fill_diagonal(mask,0)

# delete lower triangular matrix
mask = np.triu(mask)
idx = np.argwhere(mask)

# delete the target ids
idx = idx[:,1]   
points = np.delete(points, idx, 0)

Vectorized interpretation of distance matrix

2 Answers2