You can speed this up by converting to NumPy first, then using vector operations, instead of doing the work in a loop, then converting to NumPy. Something like this:
trainingArray = np.array(trainingSet)
distances = ((testInstance[0] - trainingArray[:, 0]) ** 2 +
(testInstance[1] - trainingArray[:, 1]) ** 2).sqrt()
(That's obviously untested, because without enough context to know what's actually in those variables I had to guess, but it will be something close to that.)
There are other things you can do to squeeze out a few extra %—try replacing the ** 2
with self-multiplication or the sqrt
with ** .5
, or (probably best) replacing the whole thing with np.hypot
. (If you don't know how to use timeit
—or, even better, IPython and it's %timeit
magic—now is a great time to learn.)
But ultimately, this is just going to give you a constant-multiplier speedup of about an order of magnitude. Maybe it takes 15 minutes instead of 3 hours. That's nice, but… why is it taking 3 hours in the first place? What you're doing here should be taking on the order of seconds, or even less. There's clearly something much bigger wrong here, like maybe you're calling this function N**2 times when you think you're only calling it once. And you really need to fix that part.
Of course it's still worth doing this. First, element-wise operations are simpler and more readable than loops, and harder to get wrong. Second, even if you reduce the whole program to 3.8 seconds, you'll be happy for the order-of-magnitude speedup to 0.38 seconds, right?