2

I am looking for an efficient way to do the following:

If my input is:

np.array([9,0,1,0,3,0])

I want my output to be:

np.array([0,3,2,3,1,3]) # 9 is the highest, so it gets rank 0
                        # 3 is the second highest, so it gets rank 1
                        # 1 is third highest, so it gets rank 2
                        # 0's are forth highest so they get rank 3

I am trying to apply the following to 2D matrix:

Input:

a = np.array([[9,0,1,0,3,0],
              [0,1,2,3,4,5],
              [0.01,0.3,2,100,1,1],
              [0,0,0,0,1,1],
              [4,4,4,4,4,4]])

Output:

>>> get_order_array(a)
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])

I do can achieve the above with the following solution; however, I feel that its is very inefficient, so I was hoping that someone can suggest a better way to achieve my goal.

def get_order(x):
    unique_x = np.unique(x)
    step_1 = np.argsort(unique_x)[::-1]
    temp_dict = dict(zip(unique_x, step_1))
    return np.vectorize(temp_dict.get)(x)

def get_order_array(x):
    new_array = np.empty(x.shape, dtype=np.int)
    for i in xrange(x.shape[0]):
        new_array[i] = get_order(x[i])
    return new_array
Akavall
  • 82,592
  • 51
  • 207
  • 251

3 Answers3

1

A little cumsum magic goes a long way:

a_idx = np.argsort(a, axis=-1)[:, ::-1]
a_sorted = a[np.arange(a.shape[0])[:, None], a_idx]
a_diff = np.zeros_like(a_sorted, dtype=np.bool)
a_diff[:, 1:] = a_sorted[:, :-1] != a_sorted[:, 1:]
a_sorted_ranks = np.cumsum(a_diff, axis=1)
a_ranks = a_sorted_ranks[np.arange(a_sorted_ranks.shape[0])[:, None],
                         np.argsort(a_idx, axis=1)]
>>> a_ranks
array([[0, 3, 2, 3, 1, 3],
       [5, 4, 3, 2, 1, 0],
       [4, 3, 1, 0, 2, 2],
       [1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0]])
Jaime
  • 65,696
  • 17
  • 124
  • 159
1

@Jaime's answer is great (as usual!). Here's an alternative, using scipy.stats.rankdata.

In rankdata's terminology, you want a "dense" ranking. You also want to rank the values in the opposite order than usual. To accomplish the reverse order, we'll pass -a to rankdata. We'll also subtract 1 from the ranking so the ranks begin at 0 instead of 1. Finally, you want to rank the rows of a two-dimensional array. rankdata works on one-dimensional data, so we'll have to loop over the rows.

Here's the code:

import numpy as np
from scipy.stats import rankdata


def get_order_array(a):
    b = np.empty(a.shape, dtype=int)
    for k, row in enumerate(a):
        b[k] = rankdata(-row, method='dense') - 1
    return b


if __name__ == "__main__":    
    a = np.array([[9,0,1,0,3,0],
                  [0,1,2,3,4,5],
                  [0.01,0.3,2,100,1,1],
                  [0,0,0,0,1,1],
                  [4,4,4,4,4,4]])
    print get_order_array(a)

Output:

[[0 3 2 3 1 3]
 [5 4 3 2 1 0]
 [4 3 1 0 2 2]
 [1 1 1 1 0 0]
 [0 0 0 0 0 0]]
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • This is about twice as fast as @Jamie's solution, and using a library function makes the code simpler. Thank You. – Akavall Jun 20 '14 at 14:02
0

Basically:

order = a.argsort(axis=1)
ranks = order.argsort(axis=1)

And no, I did not come up with this clever answer myself. See:

Rank items in an array using Python/NumPy

There you also find a recipe if you want to have same rank for same numbers. (This one gives consecutive ranks, if there are duplicate numbers.)

Community
  • 1
  • 1
DrV
  • 22,637
  • 7
  • 60
  • 72
  • Maybe I missing something, but I don't see any solution there that answers my question. It is the duplicate values that make it hard. – Akavall Jun 19 '14 at 20:27
  • @Akavali You are right, I did not read it carefully enough. Jaime's solution above is probably close to the best possible. – DrV Jun 19 '14 at 22:28