0

I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.

So if my input array is this:

a
np.array([[ -1.0, 0.17, 0.89, 0.00],

         [ 0.12,  0.57, 0.42, 0.00],

         [ 0.38,  0.57, 0.00, 0.031],

         [ 0.036, 0.00, 0.021, -1.0]])

I want my output array to be this:

array([[ 0,  6,  10,  1],
         
       [ 5,  9,  8,  1],
         
       [ 7,  9,  1,  3],
         
       [ 4,  1,  2,  0]])

I’ve tried argsort and scipy.stats.rankdata and both get a part of what I need.

Argsort option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)

a.ravel().argsort().argsort().reshape(a.shape)


array([[  0, 10, 15, 2],
         
       [  9, 13, 12, 3],
         
       [ 11, 14,  4, 7],
         
       [  8,  5,  6, 1]])

rankdata option: takes care of the tie-breaker but now I am missing the sequential ranking


np.reshape((rankdata(a, method='min') - 1), a.shape)

array([[  0, 10, 15, 2]
       [  9, 13, 12, 2]
       [ 11, 13,  2, 7],
       
       [  8,  2,  6, 0]])

Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.

Pamela G
  • 47
  • 8
  • Can you expand on the tie-breaking you need? – AMC Jan 19 '20 at 17:42
  • Does this answer your question? [Can numpy's argsort give equal element the same rank?](https://stackoverflow.com/questions/39059371/can-numpys-argsort-give-equal-element-the-same-rank) – Grzegorz Skibinski Jan 19 '20 at 17:49
  • Regarding the tie-breaking .. I would need the same occurrence of a value in the array to be given the same rank. In the example I provided above, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array are given the same rank, ranks are built sequentially from lowest to highest. – Pamela G Jan 19 '20 at 18:55
  • As far as as I can tell the "Can numpy's argsort give equal element the same rank? " answer only addresses alternate ways to carry out rankdata with the 'min' option. Although this approach does take care of tie-breaking it does not create ranks in sequential order which is what I need. Rankdata with 'dense' option does what I need - see answer below. – Pamela G Jan 20 '20 at 18:00

1 Answers1

0

Well, I found an answer using scipy.stats.rankdata by assigning the method to dense instead of min as below..

np.reshape((rankdata(a, method='dense') - 1), a.shape)
array([[ 0,  6,  10,  1],
         
       [ 5,  9,  8,  1],
         
       [ 7,  9,  1,  3],
         
       [ 4,  1,  2,  0]])

I would still like to know if there is a solution using Argsort without having to go through scipy.

nucsit026
  • 652
  • 7
  • 16
Pamela G
  • 47
  • 8