9

Following this trick to grab unique entries for a NumPy array, I now have a two-column array, basically of pairs with first element in the range [0.9:0.02:1.1] and the second element in the range [1.5:0.1:2.0]. Let's call this A. Currently, it's completely unsorted, i.e.

In [111]: A
Out[111]: 
array([[ 1.1 ,  1.9 ],
       [ 1.06,  1.9 ],
       [ 1.08,  1.9 ],
       [ 1.08,  1.6 ],
       [ 0.9 ,  1.8 ],
       ...
       [ 1.04,  1.6 ],
       [ 0.96,  2.  ],
       [ 0.94,  2.  ],
       [ 0.98,  1.9 ]])

I'd like to sort it so that each row first increases in the second column, then the first. i.e.

array([[ 0.9 ,  1.5 ],
       [ 0.9 ,  1.6 ],
       [ 0.9 ,  1.7 ],
       [ 0.9 ,  1.9 ],
       [ 0.9 ,  1.9 ],
       [ 0.9 ,  2.  ],
       [ 0.92,  1.5 ],
       ...
       [ 1.08,  2.  ],
       [ 1.1 ,  1.5 ],
       [ 1.1 ,  1.6 ],
       [ 1.1 ,  1.7 ],
       [ 1.1 ,  1.8 ],
       [ 1.1 ,  1.9 ],
       [ 1.1 ,  2.  ]])

but I can't find a sort algorithm that gives both. As suggested here, I've tried A[A[:,0].argsort()] and A[A[:,1].argsort()], but they only sort one column each. I've also tried applying both but the same thing happens.

I apologize if I've missed something simple but I've been looking for this for a while now...

Community
  • 1
  • 1
Warrick
  • 697
  • 10
  • 19

3 Answers3

7

numpy.lexsort will work here:

A[np.lexsort(A.T)]

You need to transpose A before passing it to lexsort because when passed a 2d array it expects to sort by rows (last row, second last row, etc).

The alternative possibly slightly clearer way is to pass the columns explicitly:

A[np.lexsort((A[:, 0], A[:, 1]))]

You still need to remember that lexsort sorts by the last key first (there's probably some good reason for this; it's the same as performing a stable sort on successive keys).

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • Seems `np.lexsort`is quite a bit faster then `np.sort` on recarrays... So I would only add as a suggestion that the unique part can be done most efficiently after sorting. – seberg Sep 20 '12 at 14:39
  • lexsort actually sorts on the keys in order, but the last key sorted on will be the primary key, second to last the secondary key, etc. – Charles Harris Jan 25 '15 at 03:31
4

the following will work, but there might be a faster way:

A = np.array(sorted(A,key=tuple))
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • @seberg -- hence "but there might be a faster way". This has the advantage that it is easy to read if speed isn't an issue. However, I'm a little surprised that it has more votes than the more "numpythonic" solution by ecatmur (but that'll probably change in time)... – mgilson Sep 19 '12 at 14:44
  • How large would be a problem? I'm using this for an array of 150,000 elements and the re-ordering isn't even noticeable. Besides, its not a repeated operation. That said, I expect to have tables of several million elements in the end. – Warrick Sep 20 '12 at 06:48
  • @Warrick I have to admit its faster then I would have guessed (though still best to just use lexsort), but there is another good reason against this if you arrays are really large, as this creates a list in between which will use much more RAM then the original array. – seberg Sep 20 '12 at 14:42
  • @seberg -- but the `lexsort` solution also creates an intermediate array of indices which I think would be comparable in size. – mgilson Sep 20 '12 at 14:45
  • @mgilson don't think so, because `sorted` gives you a list of array objects, quite an overhead considering there are only two numbers in each of the arrays. – seberg Sep 20 '12 at 14:47
  • @seberg -- ahh right, they are array objects, but they should only be views presumably. I don't know what the overhead of a view is compared to a `tuple` compared to whatever is returned by `lexsort` ... that's a reasonable point though. – mgilson Sep 20 '12 at 14:57
3

Just replace the whole thing (including the unique part with) for A being 2D:

A = np.ascontiguousarray(A) # just to make sure...
A = A.view([('', A.dtype)] * A.shape[1])

A = np.unique(A)
# And if you want the old view:
A = A.view(A.dtype[0]).reshape(-1,len(A.dtype))

I hope you are not using the set solution from the linked question unless you do not care too much about speed. The lexsort etc. is in generally great, but here not necessary since the default sort will do (if its a recarray)


Edit: A different view (with much the same result), but a bit more elegent maybe as no reshape is needed:

A = A.view([('', A.dtype, A.shape[0])])
A = np.unique(A)
# And to go back
A = A.view(A.dtype[0].base)
seberg
  • 8,785
  • 2
  • 31
  • 30
  • OK, I liked this method, but appearently at least at the moment `np.lexsort` is simply faster then `np.sort` on a recarray, so if efficiency is of extreme importance, I would say lexsort + custom unique (unique is very simple after sorting, check its python code) should beat everything by a lot. – seberg Sep 20 '12 at 14:45