I am mainly interested in 2D arrays of shape Nx3 but the issue appears in arrays of shapes Nxm where m>1 as well. Specifically, I would like to sort an Nx3 array first based on its first column, then second, and finally third.
So, assuming that we have array k
given as
array([[0.90625, 0.90625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.90625, 0.65625],
[0.15625, 0.90625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.65625, 0.65625],
[0.15625, 0.65625, 0.40625],
[0.65625, 0.15625, 0.90625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.90625, 0.40625],
[0.65625, 0.40625, 0.40625],
[0.15625, 0.15625, 0.90625],
[0.40625, 0.40625, 0.40625],
[0.65625, 0.90625, 0.40625],
[0.90625, 0.15625, 0.40625]])
the desired (sorted) array should be
array([[0.15625, 0.15625, 0.90625],
[0.15625, 0.65625, 0.40625],
[0.15625, 0.90625, 0.40625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.40625, 0.40625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.65625, 0.65625],
[0.40625, 0.90625, 0.40625],
[0.40625, 0.90625, 0.65625],
[0.65625, 0.15625, 0.90625],
[0.65625, 0.40625, 0.40625],
[0.65625, 0.90625, 0.40625],
[0.90625, 0.15625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.90625, 0.90625, 0.15625]])
I thought I could achieve that by using np.lexsort
but it seems I am probably missing something and is not working as expected. So far, I've been doing the following
In [28]: k[np.lexsort((k[:,2], k[:,1], k[:,0]))]
Out[28]:
array([[0.15625, 0.65625, 0.40625],
[0.15625, 0.15625, 0.90625],
[0.15625, 0.90625, 0.40625],
[0.40625, 0.65625, 0.65625],
[0.40625, 0.90625, 0.40625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.40625],
[0.40625, 0.90625, 0.65625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.65625, 0.15625, 0.90625],
[0.65625, 0.90625, 0.40625],
[0.65625, 0.40625, 0.40625],
[0.90625, 0.40625, 0.90625],
[0.90625, 0.15625, 0.40625],
[0.90625, 0.90625, 0.15625]])
It seems that the first column is sorted properly but the others are not. A similar question was asked before but I believe the accepted answer (which is essentially what I am doing) does not work.
From what I understood after looking a little bit more into it, I think it has to do with the values of the array being floats.
E D I T
I found the answer to my problem. However, I'll add it as an "edit" rather than posting it as an answer because I believe this whole situation could possibly be avoided if I had mentioned a fine detail about matrix k
in my original post.
Matrix k
is created from another matrix a
, where a
is essentially created by reading a matrix of floats with 16 decimals from a file. Now let's look at the workflow that led me to the solution.
In [6]: k=a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]]
In [7]: k
Out[7]:
array([[0.15625, 0.15625, 0.40625],
[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [8]: np.random.shuffle(k)
In [9]: k
Out[9]:
array([[0.15625, 0.90625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625],
[0.15625, 0.40625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.15625, 0.15625, 0.40625],
[0.90625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625]])
In [10]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[10]:
array([[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.15625, 0.15625, 0.40625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [11]: k=np.round(k, 5)
In [12]: k[np.lexsort((k[:,2],k[:,1],k[:,0]))]
Out[12]:
array([[0.15625, 0.15625, 0.40625],
[0.15625, 0.40625, 0.15625],
[0.15625, 0.65625, 0.15625],
[0.15625, 0.90625, 0.15625],
[0.40625, 0.15625, 0.15625],
[0.40625, 0.40625, 0.15625],
[0.40625, 0.65625, 0.15625],
[0.40625, 0.90625, 0.15625],
[0.65625, 0.15625, 0.15625],
[0.65625, 0.40625, 0.15625],
[0.65625, 0.65625, 0.15625],
[0.65625, 0.90625, 0.15625],
[0.90625, 0.15625, 0.15625],
[0.90625, 0.40625, 0.15625],
[0.90625, 0.65625, 0.15625],
[0.90625, 0.90625, 0.15625]])
In [13]: np.savetxt(sys.stdout, a[[1,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60]], fmt='%.18f')
0.156250000000000000 0.156250000000000000 0.406250000000000000
0.156249999999999972 0.406250000000000000 0.156250000000000028
0.156249999999999972 0.656250000000000000 0.156250000000000028
0.156249999999999972 0.906250000000000000 0.156250000000000028
0.406250000000000000 0.156249999999999972 0.156250000000000028
0.406250000000000000 0.406250000000000000 0.156250000000000028
0.406249999999999944 0.656250000000000000 0.156250000000000028
0.406249999999999944 0.906250000000000000 0.156250000000000028
0.656250000000000000 0.156249999999999972 0.156250000000000028
0.656250000000000000 0.406249999999999944 0.156250000000000028
0.656250000000000000 0.656250000000000000 0.156250000000000028
0.656250000000000000 0.906250000000000000 0.156250000000000056
0.906250000000000000 0.156249999999999972 0.156250000000000028
0.906250000000000000 0.406249999999999944 0.156250000000000028
0.906250000000000000 0.656250000000000000 0.156250000000000056
0.906250000000000000 0.906250000000000000 0.156250000000000056
As can be seen by the above, it was all a matter of rounding errors. Apparently, everything was seemingly fine when printed with a few decimals, but when the file was read and matrix a
was created, it was stored with inaccuracies after the 16th decimal place. Consequently, these inaccuracies were carried down to k
when it was defined from a
. Therefore, lexsort
was giving the correct result from the beginning considering the real number that was stored in the matrix. Everything worked fine when I rounded matrix k
.
Moral of the story: Always check the accuraccies of your values.