42

I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). For example:

array([[3, 2],
       [6, 2],
       [3, 6],
       [3, 4],
       [5, 3]])

I'd like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. So the array above should look like this:

array([[3, 2],
       [3, 4],
       [3, 6],
       [5, 3],
       [6, 2]])

If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy's sort function doesn't accept user-defined comparators. Any ideas?


EDIT: Thanks for the ideas! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can't upgrade numpy at the moment).

Mine:   4.078 secs 
mtrw:   7.046 secs
unutbu: 0.453 secs
perimosocordiae
  • 17,287
  • 14
  • 60
  • 76

7 Answers7

64

Using lexsort:

import numpy as np    
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])

ind = np.lexsort((a[:,1],a[:,0]))    

a[ind]
# array([[3, 2],
#       [3, 4],
#       [3, 6],
#       [5, 3],
#       [6, 2]])

a.ravel() returns a view if a is C_CONTIGUOUS. If that is true, @ars's method, slightly modifed by using ravel instead of flatten, yields a nice way to sort a in-place:

a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])

Since b is a view of a, sorting b sorts a as well:

print(a)
# [[3 2]
#  [3 4]
#  [3 6]
#  [5 3]
#  [6 2]]
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Ah, I'd seen lexsort in the docs, but I couldn't figure out how it would apply to this problem. Thanks! – perimosocordiae Apr 25 '10 at 01:31
  • 12
    Yes, I often have difficulty understanding documentation. Examples tend to be far more illuminating. The trouble is, after playing with examples, I reread the docs and find out the docs were perfectly clear... :-) – unutbu Apr 25 '10 at 02:00
  • This is making a copy of the array, no? –  Feb 21 '11 at 10:31
  • 1
    @Noah: yes, this is making a new array. – unutbu Feb 21 '11 at 12:04
  • I'd be interested to know if it's possible to sort() a numpy array by multiple indices without making an intermediate copy. This is what numpy.ndarray.sort does, but only for record arrays AFAICT. –  Feb 21 '11 at 18:11
  • 2
    @Noah: I've modified my answer above to show how to sort a numpy array on multiple indices, in-place. – unutbu Feb 21 '11 at 20:47
  • just a warning: the in-place version doesn't seem to give the correct answer (not sure if there was always a problem or if something changed in numpy) – Noah Sep 09 '14 at 23:31
  • I get the columns sorted separately, so the x/y coordinate pairs are no longer paired correctly: `print(a)` yields `[[2 2], [3 3], [3 3], [4 5], [6 6]]` (using the example code from your answer) – Noah Sep 10 '14 at 04:48
  • @Noah: Thanks for the bug report. The problem was that my code assumed `a` had dtype `int32`. Your `a` has a different dtype, probably `int64`. I've modified the code above to handle any dtype. – unutbu Sep 10 '14 at 11:12
  • note the inplace sort does not work in general since ravel() *can* create an intermediate copy. I think in the particular way that the array is organized in OPs question allows you to do it in place but just don't try to apply this to a more general case without verifying that a ravel() can be done in place. Also, chaining a ravel with a view `b = a.ravel().view()` obfuscates the fact that `b` might be a view of `a` or a view of a copy of `a`. Sorry if this sounds pedantic, I just wanted to leave this warning for others :) – toes Aug 22 '15 at 17:50
  • @toes: You are absolutely right. Thanks for the correction. – unutbu Aug 22 '15 at 18:38
  • 1
    Note that `lexsort` uses the **last entry** in the sequence **as primary key**, the second-last as secondary etc. This caught me off-guard. This answer does the right thing, but it is easily overlooked. – LucasB Feb 25 '18 at 15:58
  • 1
    For an arbitrary 2D array: `np.lexsort(a.T[::-1])` – scleronomic Oct 16 '20 at 16:26
23

The title says "sorting 2D arrays". Although the questioner uses an (N,2)-shaped array, it's possible to generalize unutbu's solution to work with any (N,M) array, as that's what people might actually be looking for.

One could transpose the array and use slice notation with negative step to pass all the columns to lexsort in reversed order:

>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
       [4, 2, 5],
       [3, 5, 5],
       [1, 5, 5],
       [3, 2, 1],
       [5, 2, 2],
       [3, 2, 3],
       [4, 3, 4],
       [3, 4, 1],
       [5, 3, 4]])

>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
       [3, 2, 1],
       [3, 2, 3],
       [3, 4, 1],
       [3, 5, 5],
       [4, 2, 3],
       [4, 2, 5],
       [4, 3, 4],
       [5, 2, 2],
       [5, 3, 4]])
arekolek
  • 9,128
  • 3
  • 58
  • 79
4

The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner:

import numpy_indexed as npi
npi.sort(a)  # by default along axis=0, but configurable
Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
3

I was struggling with the same thing and just got help and solved the problem. It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does:

array_name[array_name[['colname1','colname2']].argsort()]

Note the double-brackets enclosing the sorting criteria. And off course, you can use more than 2 columns as sorting criteria.

JaimeBee
  • 41
  • 4
3

You can use np.complex_sort. This has the side effect of changing your data to floating point, I hope that's not a problem:

>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3.,  2.],
       [ 3.,  4.],
       [ 3.,  6.],
       [ 5.,  3.],
       [ 6.,  2.]])
mtrw
  • 34,200
  • 7
  • 63
  • 71
2

EDIT: removed bad answer.

Here's one way to do it using an intermediate structured array:

from numpy import array

a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])

b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape

print b

which gives the desired output:

[[3 2]
 [3 4]
 [3 6]
 [5 3]
 [6 2]]

Not sure if this is quite the best way to go about it though.

ars
  • 120,335
  • 23
  • 147
  • 134
  • That doesn't quite work, because it loses the association between x and y for my points. – perimosocordiae Apr 25 '10 at 00:04
  • Hm. When I run that, I get an error on the `b.shape = a.shape` line: "ValueError: total size of new array must be unchanged". I'm running Python 2.6.2, with numpy 1.2.1. – perimosocordiae Apr 25 '10 at 00:48
  • I'm running Python 2.5.4 with numpy 1.3.0. Try upgrading the version of numpy. – ars Apr 25 '10 at 00:54
1

I found one way to do it:

from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))

It's pretty terrible to have to sort twice (and use the plain python sorted function instead of a faster numpy sort), but it does fit nicely on one line.

perimosocordiae
  • 17,287
  • 14
  • 60
  • 76