6

In my project I need to compute euclidian distance beetween each points stored in an array. The entry array is a 2D numpy array with 3 columns which are the coordinates(x,y,z) and each rows define a new point.

I'm usualy working with 5000 - 6000 points in my test cases.

My first algorithm use Cython and my second numpy. I find that my numpy algorithm is faster than cython.

edit: with 6000 points :

numpy 1.76 s / cython 4.36 s

Here's my cython code:

cimport cython
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
cdef void calcul1(double[::1] M,double[::1] R):

  cdef int i=0
  cdef int max = M.shape[0]
  cdef int x,y
  cdef int start = 1

  for x in range(0,max,3):
     for y in range(start,max,3):

        R[i]= sqrt((M[y] - M[x])**2 + (M[y+1] - M[x+1])**2 + (M[y+2] - M[x+2])**2)
        i+=1  

     start += 1

M is a memory view of the initial entry array but flatten() by numpy before the call of the function calcul1(), R is a memory view of a 1D output array to store all the results.

Here's my Numpy code :

def calcul2(M):

     return np.sqrt(((M[:,:,np.newaxis] - M[:,np.newaxis,:])**2).sum(axis=0))

Here M is the initial entry array but transpose() by numpy before the function call to have coordinates(x,y,z) as rows and points as columns.

Moreover this numpy function is quite convinient because the array it returns is well organise. It's a n by n array with n the number of points and each points has a row and a column. So for example the distance AB is stored at the intersection index of row A and column B.

Here's how I call them (cython function):

cpdef test():

  cdef double[::1] Mf 
  cdef double[::1] out = np.empty(17998000,dtype=np.float64) # (6000² - 6000) / 2

  M = np.arange(6000*3,dtype=np.float64).reshape(6000,3) # Example array with 6000 points
  Mf = M.flatten() #because my cython algorithm need a 1D array
  Mt = M.transpose() # because my numpy algorithm need coordinates as rows

  calcul2(Mt)

  calcul1(Mf,out)

Am I doing something wrong here ? For my project both are not fast enough.

1: Is there a way to improve my cython code in order to beat numpy's speed ?

2: Is there a way to improve my numpy code to compute even faster ?

3: Or any other solutions, but it must be a python/cython (like parallel computing) ?

Thank you.

UserAt
  • 107
  • 1
  • 8
  • 1
    If you don't need the distances and only care about the differences / ranking, then you could get rid of the sqrt, which should be the slowest part of your calculation. Maybe you could also use a faster sqrt, which is not as precise or use some other metric (e.g. taxicab). – sascha May 18 '16 at 11:58
  • 2
    With 5000 to 6000 points, your matrix will have around 30 million entries. Computing a square root 30m times is bound to be slow. Do you really need the full, dense matrix? What are you doing with the matrix after computing it? – Sven Marnach May 18 '16 at 12:01
  • How much faster is numpy than cython? – sebacastroh May 18 '16 at 12:08
  • Following sascha's advice, using `x*x` may be faster than `x**2` for you when computing L2 norms – AndyG May 18 '16 at 12:31
  • You're right my problem is weird, and it's normal because it's in fact, just an example in order to establish a methodology about the fastest/optimize way to code in python. I don't really care about the results, this case is usefull just to compare how perform numpy and cyhton on a same problem with many data. – UserAt May 18 '16 at 12:34
  • 3
    Did you see https://stackoverflow.com/questions/25213603/speeding-up-distance-matrix-computation-with-numpy-and-cython ? Isn't the same problem? – sebacastroh May 18 '16 at 12:35
  • See this answer using Numpy directly: http://stackoverflow.com/a/1401828/4323 – John Zwinck May 18 '16 at 13:00

1 Answers1

8

Not sure where you are getting your timings, but you can use scipy.spatial.distance:

M = np.arange(6000*3, dtype=np.float64).reshape(6000,3)
np_result = calcul2(M)
sp_result = sd.cdist(M.T, M.T) #Scipy usage
np.allclose(np_result, sp_result)
>>> True

Timings:

%timeit calcul2(M)
1000 loops, best of 3: 313 µs per loop

%timeit sd.cdist(M.T, M.T)
10000 loops, best of 3: 86.4 µs per loop

Importantly, its also useful to realize that your output is symmetric:

np.allclose(sp_result, sp_result.T)
>>> True

An alternative is to only compute the upper triangular of this array:

%timeit sd.pdist(M.T)
10000 loops, best of 3: 39.1 µs per loop

Edit: Not sure which index you want to zip, looks like you may be doing it both ways? Zipping the other index for comparison:

%timeit sd.pdist(M)
10 loops, best of 3: 135 ms per loop

Still about 10x faster than your current NumPy implementation.

Daniel
  • 19,179
  • 7
  • 60
  • 74
  • Out of curiosity, what size of `M` did you use for these timings? – Sven Marnach May 18 '16 at 14:45
  • @SvenMarnach `(6000, 3)` as in the OP, I have updated my question to make this more clear. – Daniel May 18 '16 at 15:12
  • Sorry but I don't understand what `M.T` refer to ? Is it the upper triangle of `M` ? – UserAt May 18 '16 at 15:23
  • @UserAt `M.T` is just the transpose of `M`. So depending if you pass `M` or `M.T` you obtain the euclidian distances along different axes. The upper triangular will only be returned with the `sd.pdist` example. – Daniel May 18 '16 at 17:24
  • I think there is something wrong, you said that sd.pdist(M) still 10 times faster than my numpy implementation and I totaly agree with that, since you got 135ms and I have 1.76s. But if `M` is (6000,3) why your first `%timeit calcul2()` took only 312 µs ? – UserAt May 19 '16 at 06:47
  • @UserAt It depends on which index you are looking at the Eucledian distance of. For the shortest timings we look at the Eucledian distance along the 6000 dim index, while for the 135ms we look at the distance along the 3 dim index. – Daniel May 24 '16 at 14:36