0

I have a 4 dimensional data set, say X. Which happens to be the iris dataset. I form a sub list of 10 data points from this set, called mu. For each of these 10 data points, I am to calculate the sum of the 10 smallest squared distances of points in mu to their closest neighbor. Closest neighbors here could include data points from the original data set. How am I to achieve the same?

I think I could use something like this -

(np.array([min([np.linalg.norm(x-c)**2 for x in X]) for c in mu]))

But 'x' here wouldn't exclude the very point under consideration ('c'), would it?

John Zwinck
  • 239,568
  • 38
  • 324
  • 436

1 Answers1

0

If it is safe to assume that your points are unique (so that you will never have two points overlap exactly, you can filter out points that are equal from your list comprehension:

np.array([min([np.linalg.norm(x-c)**2 for x in X if not np.array_equal(x, c)]) for c in mu]

This, however, as a one-liner becomes a bit too long to read easily. I would therefore recommend a re-write in a PEP-8 compliant way:

res = np.empty(len(mu)) # allocate space for result
for i, c in enumerate(mu):
    res[i] = min([np.linalg.norm(x-c)**2 
                  for x in X if not np.array_equal(x, c)])

even though it is not quite as elegant as a one-liner.

JohanL
  • 6,671
  • 1
  • 12
  • 26