1

So I currently have the following dataframe which I aggregated and I have a multiindex that looks like the following:

Date     Country_Band      Value      Decimal
May 2021 Non-US            2-14       0.11
         US                2-14       0.22
                           1          0.33
                           15+        0.44
         Non-US            1          0.55
                           15+        0.66

I want to organize and group these in a way to obtain the below:

Date     Country_Band      Value      Decimal
May 2021 US                1          0.33
                           2-14       0.22
                           15+        0.44

         Non-US            1          0.55
                           2-14       0.11
                           15+        0.66

This is the index of a larger dataframe. I first tried to do the following code:

df_march_agg = df_march_agg.reindex(['US', 'Non-US'], level='Country_Band')

Which worked in getting the country band group, however, the value is still not in numerical order:

Date     Country_Band      Value      Decimal
May 2021 US                2-14       0.22
                           1          0.33 
                           15+        0.44

         Non-US            2-14       0.11
                           1          0.55
                           15+        0.66

I tried then doing the same:

df_march_agg = df_march_agg.reindex(['1', '2-14', '15+'], level='Value')

But this then undid the previous reindex. Any idea of what I am missing or need to add in order to get both in order?

Cheers!

Nhyi
  • 373
  • 1
  • 12
  • please provide your dataframes as dataframe constructors (read [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) the reason why) – mozway Jan 07 '22 at 13:08

1 Answers1

2

One idea with ordered categoricals in MultiIndex.set_levels, so possible use DataFrame.sort_index:

df.index = (df.index.set_levels(pd.CategoricalIndex(df.index.levels[1], 
                                                   ordered=True,
                                                   categories=['US', 'Non-US']), 
                                                   level=1)
                    .set_levels(pd.CategoricalIndex(df.index.levels[2], 
                                                   ordered=True, 
                                                   categories=['1', '2-14', '15+']), 
                                                   level=2))

df = df.sort_index()
print (df)
                             Decimal
Date     Country_Band Value         
May 2021 US           1         0.33
                      2-14      0.22
                      15+       0.44
         Non-US       1         0.55
                      2-14      0.11
                      15+       0.66

Another idea with DataFrame.reindex with MultiIndex.from_product:

mux = pd.MultiIndex.from_product([['May 2021'],
                                  ['US', 'Non-US'],
                                  ['1', '2-14', '15+']], 
                                  names=['Date','Country_Band','Value'])

df = df.reindex(mux)
print (df)
                             Decimal
Date     Country_Band Value         
May 2021 US           1         0.33
                      2-14      0.22
                      15+       0.44
         Non-US       1         0.55
                      2-14      0.11
                      15+       0.66
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is exactly what I was looking for? Would you be able to explain why my way of chaining was incorrect? – Nhyi Jan 07 '22 at 13:19
  • @Nhyi - because working with levels separately, need set all levels in one `reindex` like second paragraph. – jezrael Jan 07 '22 at 13:20