0

Lets say I have an array of data, and 3 other arrays corresponding to the x,y, and z location of each data point in space

data = np.random.random(10000)
x = np.random.random(10000)
y = np.random.random(10000)
z = np.random.random(10000)

Now, I want to get a subset of the data points which meet some criteria.

Specifically, I want the subset of data points which
a) have a value greater than some threshold t1

b) are greater than distance d away from data points which have a value greater than t2

What is an efficient way of going about doing this?

hm8
  • 1,381
  • 3
  • 21
  • 41
  • this post has info on timing comparisons for calculating the distance: https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy – Zulfiqaar Feb 14 '19 at 16:58

1 Answers1

0

I would use plain matrix multiplication and compare against square distance:

data = np.random.random(10000)
x = np.random.random(10000)
y = np.random.random(10000)
z = np.random.random(10000)
position = np.vstack((x,y,z)).T
t1 = 0.5
t2 = 0.3
dmin = 0.1

m1 = data>t1
m2 = (np.matmul(position[m1], position[data>t2].T) > dmin**2).all(axis=1)
data_filtered = data[m1][m2]
Tarifazo
  • 4,118
  • 1
  • 9
  • 22