sorting a two dimensional array based on one column

Question

I'd like to sort a two-dimensional array, based on for instance second column (if the rows are sorted based on from low to high all the other rows with the same index of this column get shuffled based on the new order in the second column). It is easy to implement it in python.

 d=np.array([[ 0.98807639,  0.17761071,  0.02576818],
            [ 0.90376256,  0.91729465,  0.42179004],
            [ 0.73540802,  0.38300233,  0.99331352],
            [ 0.99808863,  0.83837682,  0.16279504],
            [ 0.34154819,  0.6701753 ,  0.85538715],
            [ 0.15164261,  0.2007122 ,  0.80347646]])

data=np.array(sorted(d, key=lambda  l:l[1]))
data=np.array([[ 0.98807639,  0.17761071,  0.02576818],
               [ 0.15164261,  0.2007122 ,  0.80347646],
               [ 0.73540802,  0.38300233,  0.99331352],
               [ 0.34154819,  0.6701753 ,  0.85538715],
               [ 0.99808863,  0.83837682,  0.16279504],
               [ 0.90376256,  0.91729465,  0.42179004]])

but I need to do the same procedure in cython, in order to improve the speed of my code since numpy module is very slow. In c there is function qsort but I don't know how it should be implemented for 2d array, since I am not very familiar with pointer structure in c. How it should be done in cython to also speed up the code considerably for big arrays?

http://stackoverflow.com/questions/17202178/c-qsort-with-dynamic-n-by-2-multi-dimensional-array — Pieter21, Jul 29 '14 at 15:23
Why do you tag the question both C and C++? Which is it? The answers are very different. — Mark Ransom, Jul 29 '14 at 15:30

score 2 · Accepted Answer · answered Jul 29 '14 at 16:10

but I need to do the same procedure in cython, in order to improve the speed of my code since numpy module is very slow.

You're not really using the numpy module. Your command

data=np.array(sorted(d, key=lambda  l:l[1]))

uses a non-numpy lambda, and a pure-Python function sorted which constructs a Python list, and then finally after doing all that you make a new numpy array.

For an array as small as 6x3 you'd only get a factor of a couple by working in numpy -- the various overheads are too high -- but for a larger array, you can get significant benefits (here using argsort):

>>> d = np.random.random((10**6, 3))
>>> # slow method
>>> %timeit np.array(sorted(d, key=lambda  l:l[1]))
1 loops, best of 3: 2.56 s per loop
>>> # faster method
>>> %timeit d[d[:,1].argsort()]
1 loops, best of 3: 197 ms per loop

(Note that this only sorts by column #1; your title and code only refer to one column, so I'm ignoring your reference to "the new order in third column".)

well thanks for the comparison. I would like also to see how much coding it in `cython` can improve the speed of sorting. — Dalek, Jul 29 '14 at 17:40
Cython is a way of making Python code fast by turning it into C. The underlying argsort is itself written in C, so I'd be surprised if you got much of a benefit. Don't forget that it's also not enough to make it faster: you have to make it fast enough to outweigh the time you spend trying to optimize it. Your call, though. :^) — DSM, Jul 29 '14 at 17:52

sorting a two dimensional array based on one column

1 Answers1