I've tried to run this little code, it just takes random points (here 50k, near of what I have in reality) and returns the 10th nearest points for each point randomly selected.
But unfortunately this is (really !) long because of the loop for sure.
As I'm pretty new in 'code optimization', is there a trick to make this mush faster ? (Faster at Python scale I know I'm not coding in C++).
Here is a reproducible example with data size close to what I have:
import time
import numpy as np
from numpy import random
from scipy.spatial import distance
# USEFUL FUNCTION
start_time = time.time()
def closest_node(node, nodes):
nodes = np.asarray(nodes)
deltas = nodes - node
dist_2 = np.einsum("ij,ij->i", deltas, deltas)
ndx = dist_2.argsort()
return data[ndx[:10]]
# REPRODUCIBLE DATA
mean = np.array([0.0, 0.0, 0.0])
cov = np.array([[1.0, -0.5, 0.8], [-0.5, 1.1, 0.0], [0.8, 0.0, 1.0]])
data = np.random.multivariate_normal(mean, cov, 500000)
# START RUNNING
points = data[np.random.choice(data.shape[0], int(np.round(0.1 * len(data), 0)))]
print(len(points))
for w in points:
closest_node(w, data)
print("--- %s seconds ---" % (time.time() - start_time))