4

EDIT: To put in sample data df and expected output. EDIT 2: I've modified the data slightly so that the results are not uniformly largest number associated with 'cc' in each case.

My problem is:

  • I have a dataframe with two index columns that I am grouping by (Index1, Index2) and three columns (X, Y, Z)
  • I've created a groupby and applied a function to it (to scale all columns within the groupby object to 1)
  • I've summed across the resulting dataframe to get a sum for each row

The df is:

df = pd.DataFrame({'Index1': ['A', 'A', 'A', 'B', 'B', 'B'],
                'Index2': ['aa', 'bb', 'cc', 'aa', 'bb', 'cc'],
                'X': [1, 2, 7, 3, 6, 1],
                'Y': [2, 3, 6, 2, 4, 1],
                'Z': [3, 5, 9, 1, 2, 1]})

Then the code is:

df_scored = pd.DataFrame()   #new df to hold results
cats = [X, Y, Z]             #categories (columns of df) to be scaled
grouped = df.groupby([Index 1, Index 2]).sum()
for cat in cats :
    df_scored[cat] = grouped.groupby(level = 0)[cat].apply(lambda x: x / x.max())
df_scored['Score'] = df_scored.sum(axis = 1)

This produces:

                      X         Y         Z     Score
Index1 Index2                                        
A      aa      0.142857  0.333333  0.333333  0.809524
       bb      0.285714  0.500000  0.555556  1.341270
       cc      1.000000  1.000000  1.000000  3.000000
B      aa      0.500000  0.500000  0.500000  1.500000
       bb      1.000000  1.000000  1.000000  3.000000
       cc      0.166667  0.250000  0.500000  0.916667

Now I want to sort the resulting df_scored by each grouping of Index 1 (so that Index 2 is sorted by 'Score' within each group of Index 1), with this as the desired result:

                      X         Y         Z     Score
Index1 Index2                                        
A      cc      1.000000  1.000000  1.000000  3.000000
       bb      0.285714  0.500000  0.555556  1.341270
       aa      0.142857  0.333333  0.333333  0.809524
B      bb      1.000000  1.000000  1.000000  3.000000
       aa      0.500000  0.500000  0.500000  1.500000
       cc      0.166667  0.250000  0.500000  0.916667

How do I do this?

I've seen a few other questions on this here and here but not getting it to work for me in this case.

jbachlombardo
  • 141
  • 2
  • 13
  • 4
    Can you provide us with some reproducible code / data for testing? – jpp Feb 16 '18 at 02:48
  • 3
    Create some dummy data with multiindex and expected output. That will help us help you. – Scott Boston Feb 16 '18 at 03:00
  • @jp_data_analysis & Scott have tried to create the data / code / hoped for output above. This all works when I run it -- if I can clarify further please let me know – jbachlombardo Feb 16 '18 at 04:46

1 Answers1

6

Add this at the end of your code

df_scored.sort_values('Score', ascending= False).sort_index(level='Index1', sort_remaining=False)
Sociopath
  • 13,068
  • 19
  • 47
  • 75
  • Thanks -- I think that's sorting by the index though, not by the 'Score' column? If I fiddle with the data to change it so cc's results aren't the highest across the board, it's only sorting alphabetically by 'Index2', not numerically by the 'Score' column. – jbachlombardo Feb 16 '18 at 06:16
  • @jbachlombardo Edited the answer. Hope it helps. – Sociopath Feb 16 '18 at 06:39
  • 1
    Thank you! Works like a charm, much appreciated. – jbachlombardo Feb 16 '18 at 06:48