1

I'm trying to rank order (row by row), from greatest to least, a multi-level column dataframe by specific columns (b_1 in this example). I'm hoping to do it in a way that would be dynamic, allowing for additional columns (both level 1 and level 2 in this example) and rows. Thank you.

I'm not sure if the best way would be to put columns b_1 into a separate dataframe, rank, and then drop back into the existing or if there is a way to add directly into the existing dataframe. enter image description here

Here's an example of the kind of data I'm working with.

import pandas as pd
import numpy as np

level_2 = ['a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1']
level_1 = ['A1', 'A1', 'A1', 'B1', 'B1', 'B1', 'C1', 'C1', 'C1']

data = [['a', 23, 'h', 'o', 45, 'v', 'a3', 1, 'b1'], ['b', 34, 'i', 'p', 3, 'w', 'a4', 32, 'b2'], ['c', 5, 'j', 'q', 7, 'x', 'a5', 6, 'b3'], ['d', 2, 'k', 'r', 5, 'y', 'a6', 76, 'b4'], ['e', 78, 'l', 's', 65, 'z', 'a7', 9, 'b5'], ['f', 98, 'm', 't', 23, 'a1',  'a8', 14, 'b6'], ['g', 3, 'n', 'u', 1, 'a2', 'a9', 45, 'b7']]

columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df1 = pd.DataFrame(data, columns=columns)

date = ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023'] 

df1.insert(0, 'date', date)

df1.set_index('date', inplace=True)
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
scstx
  • 53
  • 4
  • It is very difficult to answer your question without seeing any of your data nor any of the solution you have written which produces your problem. Please edit your question to show a minimal reproducible set consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") for details on how to best help us help you. – itprorh66 Jun 13 '23 at 18:23
  • Example table has been added below the initial text. – scstx Jun 13 '23 at 18:30
  • ***DO NOT*** post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. For more information please see the Meta FAQ entry [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question/285557#285557) – itprorh66 Jun 13 '23 at 18:34
  • @scstx, can you make your existing table/df with the `pd.DataFrame` constructor (*i.e use* `df.to_dict()`) ? Also, what's the logic behind the rankings ? – Timeless Jun 13 '23 at 18:41
  • If I understand your question correctly, in practice the rankings are based on the standard deviation (which would be in column b_1) of multiple assets (A1, B1, etc). So, the actual dataframe is much larger than the one in the example, and would be rolling, by date. I haven't tried using df.to_dict. I'm not sure how I would apply it in this case. – scstx Jun 13 '23 at 18:53
  • What do you mean by *EXISTING* in the image ? Is it a table held in an Excel spreadsheet or a DataFrame ? You said you're using *a multi-level column dataframe*, so I suppose you're using [pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) since you have a Python tag and you'll be able to use `df.to_dict()` and include its result **as text** to your question. Also, please clarify the logic of rankings. Why do you have always only three values `1`, `2` and `3` ? – Timeless Jun 13 '23 at 18:58
  • My apologies, I'm referring to a DataFrame. The rankings are based on the value in the b_1 columns in each row. So row 1/1/23, column B1-b_1 has the greatest value (45) so it's ranked one. In that same row, A1-b_1 has the second greatest value, so it's ranked two and C1-b_1 would be 3. Same logic for row 1/2/23. I hope this clarifies, thank you. – scstx Jun 13 '23 at 19:08
  • The logic is now clear but again, in your [previous](https://stackoverflow.com/questions/76435912/) question, you provided a [MRE](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) with the `pd.DataFrame` *constructor*. Can you do the same with this one ? – Timeless Jun 13 '23 at 19:49
  • Yes, I've added to the initial question. Thanks again. – scstx Jun 13 '23 at 20:05

1 Answers1

3

As a good start, you can try this :

ranks = (
    df.xs("b_1", axis=1, level=1, drop_level=False)
        .rank(axis=1, ascending=False)
        .rename(columns={"b_1": "NEW"})
        .astype(int)
)

out = pd.concat([df1, ranks], axis=1).sort_index(axis=1, level=0)

Output :

print(out)

          A1              B1              C1            
         NEW a_1 b_1 c_1 NEW a_1 b_1 c_1 NEW a_1 b_1 c_1
date                                                    
1/1/2023   2   a  23   h   1   o  45   v   3  a3   1  b1
1/2/2023   1   b  34   i   3   p   3   w   2  a4  32  b2
1/3/2023   3   c   5   j   1   q   7   x   2  a5   6  b3
1/4/2023   3   d   2   k   2   r   5   y   1  a6  76  b4
1/5/2023   1   e  78   l   2   s  65   z   3  a7   9  b5
1/6/2023   1   f  98   m   2   t  23  a1   3  a8  14  b6
1/7/2023   2   g   3   n   3   u   1  a2   1  a9  45  b7
Timeless
  • 22,580
  • 4
  • 12
  • 30