Actually, if you look into the source code of pandas DataFrame, you'll see that sort() is just a wrapper of sort_index() with different parameter, and, as @Jeff said in this question, sort_index() is prefered method to use.
The sort_index() method using numpy.argsort() with default kind=quicksort
, if you're sorting only by one column. And quicksort() is not stable, that's why your index looks shuffled.
But you can pass kind
parameter to sort_index() (one of 'mergesort'
, 'quicksort'
, 'heapsort'
), so you can use stable sort ('mergesort'
) for your task:
>>> mydf.sort_index(by=['stars'], ascending=False, kind='mergesort')
stars
17 5
11 5
6 5
1 5
19 4
18 4
15 4
14 4
7 4
5 4
2 4
10 3
8 3
4 3
16 2
12 2
9 2
3 2
13 1
0 1
sort_index() also using mergesort (or counting sort) if there're more that one column in by
parameter, it's interesting, for example, you can do this:
>>> mydf.sort_index(by=['stars', 'stars'], ascending=False)
stars
1 5
6 5
11 5
17 5
2 4
5 4
7 4
14 4
15 4
18 4
19 4
4 3
8 3
10 3
3 2
9 2
12 2
16 2
0 1
13 1
Now the sort is stable, but indexes are sorted ascending