I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.
So if my input array is this:
a
np.array([[ -1.0, 0.17, 0.89, 0.00],
[ 0.12, 0.57, 0.42, 0.00],
[ 0.38, 0.57, 0.00, 0.031],
[ 0.036, 0.00, 0.021, -1.0]])
I want my output array to be this:
array([[ 0, 6, 10, 1],
[ 5, 9, 8, 1],
[ 7, 9, 1, 3],
[ 4, 1, 2, 0]])
I’ve tried argsort
and scipy.stats.rankdata
and both get a part of what I need.
Argsort
option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)
a.ravel().argsort().argsort().reshape(a.shape)
array([[ 0, 10, 15, 2],
[ 9, 13, 12, 3],
[ 11, 14, 4, 7],
[ 8, 5, 6, 1]])
rankdata
option: takes care of the tie-breaker but now I am missing the sequential ranking
np.reshape((rankdata(a, method='min') - 1), a.shape)
array([[ 0, 10, 15, 2]
[ 9, 13, 12, 2]
[ 11, 13, 2, 7],
[ 8, 2, 6, 0]])
Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.