np.where() to eliminate data, where coordinates are too close to each other

Question

I'm doing aperture photometry on a cluster of stars, and to get easier detection of background signal, I want to only look at stars further apart than n pixels (n=16 in my case). I have 2 arrays, xs and ys, with the x- and y-values of all the stars' coordinates: Using np.where I'm supposed to find the indexes of all stars, where the distance to all other stars is >= n

So far, my method has been a for-loop

import numpy as np

# Lists of coordinates w. values between 0 and 2000 for 5000 stars

xs = np.random.rand(5000)*2000
ys = np.random.rand(5000)*2000

# for-loop, wherein the np.where statement in  question is situated
n = 16
for i in range(len(xs)):
    index = np.where( np.sqrt( pow(xs[i] - xs,2) + pow(ys[i] - ys,2)) >= n)

Due to the stars being clustered pretty closely together, I expected a severe reduction in data, though even when I tried n=1000 I still had around 4000 datapoints left

I do not know how did you get those 4000 datapoints left. First, the way you are using the random is incorrect, that creates a 2D array where the second dimension has 5000 elements.. if you want 5000 elements between 0 and 2000 you can do **np.random.rand(5000)*2000** . Also the np.where is not doing what you want. If you try to print index you will see it returns an array each time not a index. — Miguel, Sep 26 '19 at 10:53
Changed the random datapoints to actually work. I'm not restricted to numpy per se, but I prefer keeping stuff as simple as possible — Peter Juelsgaard, Sep 26 '19 at 11:05

Daniel F · Answer 1 · 2019-09-26T13:07:19.857

Using just numpy (and part of the answer here)

X = np.random.rand(5000,2) * 2000
XX = np.einsum('ij, ij ->i', X, X)
D_squared = XX[:, None] + XX - 2 * X.dot(X.T)
out = np.where(D_squared.min(axis = 0) > n**2)

Using scipy.spatial.pdist

from scipy.spatial import pdist, squareform
D_squared = squareform(pdist(x, metric = 'sqeuclidean'))
out = np.where(D_squared.min(axis = 0) > n**2)

Using a KDTree for maximum fast:

from scipy.spatial import KDTree

X_tree = KDTree(X)
in_radius = np.array(list(X_tree.query_pairs(n))).flatten()
out = np.where(~np.in1d(np.arange(X.shape[0]), in_radius))

score 0 · Answer 2 · answered Sep 26 '19 at 11:06

np.random.seed(seed=1)
xs = np.random.rand(5000,1)*2000
ys = np.random.rand(5000,1)*2000

n = 16

mask = (xs>=0)

for i in range(len(xs)):
    if mask[i]:
        index = np.where( np.sqrt( pow(xs[i] - x,2) + pow(ys[i] - y,2)) <= n)
        mask[index] = False
        mask[i] = True

x = xs[mask]
y = ys[mask]

print(len(x))

4220

score 0 · Answer 3 · answered Sep 26 '19 at 11:15

You can use np.subtract.outer for creating the pairwise comparisons. Then you check for each row whether the distance is below 16 for exactly one item (which is the comparison with the particular start itself):

distances = np.sqrt(
    np.subtract.outer(xs, xs)**2
    + np.subtract.outer(ys, ys)**2
)
indices = np.nonzero(np.sum(distances < 16, axis=1) == 1)

np.where() to eliminate data, where coordinates are too close to each other

3 Answers3