Python Select data points which are a certain Euclidean distance away from other data points

Question

Lets say I have an array of data, and 3 other arrays corresponding to the x,y, and z location of each data point in space

data = np.random.random(10000)
x = np.random.random(10000)
y = np.random.random(10000)
z = np.random.random(10000)

Now, I want to get a subset of the data points which meet some criteria.

Specifically, I want the subset of data points which
a) have a value greater than some threshold t1

b) are greater than distance d away from data points which have a value greater than t2

What is an efficient way of going about doing this?

this post has info on timing comparisons for calculating the distance: https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy — Zulfiqaar, Feb 14 '19 at 16:58

score 0 · Answer 1 · answered Feb 14 '19 at 17:40

I would use plain matrix multiplication and compare against square distance:

data = np.random.random(10000)
x = np.random.random(10000)
y = np.random.random(10000)
z = np.random.random(10000)
position = np.vstack((x,y,z)).T
t1 = 0.5
t2 = 0.3
dmin = 0.1

m1 = data>t1
m2 = (np.matmul(position[m1], position[data>t2].T) > dmin**2).all(axis=1)
data_filtered = data[m1][m2]

Python Select data points which are a certain Euclidean distance away from other data points

1 Answers1