How to find the index of a sorted series?

Question

I suspect I am misunderstanding something.

Problem: Given a series, I want to return a new series where the value at each row would be the index if that series was sorted.

I posted a different question and seemed like argsort was the right solution. But after reading about argsort, I believe it is not. Here is the doc.

Returns the indices that would sort an array.

Here is an example:

test = pd.Series(np.random.randint(20, size=10), index=['red', 'green', 'yellow', 'purple', 'orange', 'white', 'black', 'pink', 'brown', 'gray'])
>>> test
red        2
green     17
yellow     8
purple    19
orange    12
white      0
black     15
pink       5
brown     14
gray      14

>>> test.argsort()
red       5
green     0
yellow    7
purple    2
orange    4
white     8
black     9
pink      6
brown     1
gray      3

But what I actually want is the index for each color as if it was sorted. For example, if I do test.sort_values()

>>> test.sort_values()
white      0
red        2
pink       5
yellow     8
orange    12
brown     14
gray      14
black     15
green     17
purple    19
dtype: int64

This makes sense because it will produce same results as test[test.argsort()].

So what do I do to get something like?

red       1
green     8
yellow    3
purple    9
orange    4
white     0
black     7
pink      2
brown     5
gray      6

Similar question to Numpy argsort - what is it doing? but I don't think it was ever answered to what I want the function to do.

I hope this makes sense.

Second answer uses scipy and a lot of code which I was hoping to do with an existing function in pandas. I can't use scipy on my project right now. — Amir Raminfar, Dec 11 '17 at 19:42
Read - `using_indexed_assignment(x)`, `using_argsort_twice(x)`. — Divakar, Dec 11 '17 at 19:44
@AmirRaminfar: It's only a lot of code because it's showing 4 different ways to do it. Half the options are one-liners. — user2357112, Dec 11 '17 at 19:46
Yup, missed that. I thought all those needed to be executed together. — Amir Raminfar, Dec 11 '17 at 19:46

score 1 · Answer 1 · answered Dec 11 '17 at 19:37

1

We can using rank

test.rank(method ='first')-1
Out[917]: 
red       1.0
green     8.0
yellow    3.0
purple    9.0
orange    4.0
white     0.0
black     7.0
pink      2.0
brown     5.0
gray      6.0
Name: tt, dtype: float64

answered Dec 11 '17 at 19:37

BENY

317,841
20
164
234

This helps a lot. I didn't realize `method` was a parameter. – Amir Raminfar Dec 11 '17 at 19:44
@AmirRaminfar Yw~:-) – BENY Dec 11 '17 at 19:51
Now which one is faster? :) – Amir Raminfar Dec 11 '17 at 19:58
@AmirRaminfar they are close , since you are using two times of `argsort`..(if only using argsort once numpy is the winner ) – BENY Dec 11 '17 at 20:02

Amir Raminfar · Accepted Answer · 2017-12-13T19:03:46.587

Looks like I missed the answer in that post. Doing argsort twice is the best answer.

test.argsort().argsort()

Explanation:

The first argsort returns a permutation (which if applied to the data would sort it). When argsort is applied to (this or any) permutation, it returns the inverse permutation (that if the 2 permutations are applied to each other in either order the result is the Identity). The second permutation if applied to a sorted data array would produce the unsorted data array, i.e. it is the rank

%timeit test.argsort().argsort()
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 146 µs per loop
%timeit test.rank(method='first').astype(int) - 1
1000 loops, best of 3: 234 µs per loop

This suggests argsort() is faster smaller data.

How to find the index of a sorted series?

2 Answers2