0

I suspect I am misunderstanding something.

Problem: Given a series, I want to return a new series where the value at each row would be the index if that series was sorted.

I posted a different question and seemed like argsort was the right solution. But after reading about argsort, I believe it is not. Here is the doc.

Returns the indices that would sort an array.

Here is an example:

test = pd.Series(np.random.randint(20, size=10), index=['red', 'green', 'yellow', 'purple', 'orange', 'white', 'black', 'pink', 'brown', 'gray'])
>>> test
red        2
green     17
yellow     8
purple    19
orange    12
white      0
black     15
pink       5
brown     14
gray      14

>>> test.argsort()
red       5
green     0
yellow    7
purple    2
orange    4
white     8
black     9
pink      6
brown     1
gray      3

But what I actually want is the index for each color as if it was sorted. For example, if I do test.sort_values()

>>> test.sort_values()
white      0
red        2
pink       5
yellow     8
orange    12
brown     14
gray      14
black     15
green     17
purple    19
dtype: int64

This makes sense because it will produce same results as test[test.argsort()].

So what do I do to get something like?

red       1
green     8
yellow    3
purple    9
orange    4
white     0
black     7
pink      2
brown     5
gray      6

Similar question to Numpy argsort - what is it doing? but I don't think it was ever answered to what I want the function to do.

I hope this makes sense.

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123

2 Answers2

1

We can using rank

test.rank(method ='first')-1
Out[917]: 
red       1.0
green     8.0
yellow    3.0
purple    9.0
orange    4.0
white     0.0
black     7.0
pink      2.0
brown     5.0
gray      6.0
Name: tt, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Looks like I missed the answer in that post. Doing argsort twice is the best answer.

test.argsort().argsort()

Explanation:

The first argsort returns a permutation (which if applied to the data would sort it). When argsort is applied to (this or any) permutation, it returns the inverse permutation (that if the 2 permutations are applied to each other in either order the result is the Identity). The second permutation if applied to a sorted data array would produce the unsorted data array, i.e. it is the rank

%timeit test.argsort().argsort()
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 146 µs per loop
%timeit test.rank(method='first').astype(int) - 1
1000 loops, best of 3: 234 µs per loop

This suggests argsort() is faster smaller data.

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123