2

Given a Time Series DataFrame is it possible to create a new DataFrame with the same dimensions but the values are the ranking for each row compared to other columns (ordered smallest value first)?

Example:

                 ABC      DEFG      HIJK       XYZ
date                                              
2018-01-14  0.110541  0.007615  0.063217  0.002543
2018-01-21  0.007012  0.042854  0.061271  0.007988
2018-01-28  0.085946  0.177466  0.046432  0.069297
2018-02-04  0.018278  0.065254  0.038972  0.027278
2018-02-11  0.071785  0.033603  0.075826  0.073270

The first row would become:

            ABC  DEFG  HIJK  XYZ
date                            
2018-01-14    4     2     3    1

as XYZ has the smallest value in that row and ABC the largest.

numpy.argsort looks like it might help however as it outputs the location itself I have not managed to get it to work.

Many thanks

Ewan
  • 14,592
  • 6
  • 48
  • 62

1 Answers1

3

Use double argsort for rank per rows and pass to DataFrame constructor:

df1 = pd.DataFrame(df.values.argsort().argsort() + 1, index=df.index, columns=df.columns)
print (df1)
            ABC  DEFG  HIJK  XYZ
date                            
2018-01-14    4     2     3    1
2018-01-21    1     3     4    2
2018-01-28    3     4     1    2
2018-02-04    1     4     3    2
2018-02-11    2     1     4    3

Or use DataFrame.rank with method='dense':

df1 = df.rank(axis=1, method='dense').astype(int)
print (df1)
            ABC  DEFG  HIJK  XYZ
date                            
2018-01-14    4     2     3    1
2018-01-21    1     3     4    2
2018-01-28    3     4     1    2
2018-02-04    1     4     3    2
2018-02-11    2     1     4    3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252