-1

I'd like to add subtotal rows for index #1 (ie. Fruits and Animal) and subtotal columns for columns (ie. 2015 and 2016).

For the subtotal columns, I could do something like this, but it seems inefficient to run this type of code for each year (2015 & 2016). Is there a better way? I don't think 'margins' will work because there are multiple subtotals.

df[('2015','2015_Total')] = df[('2015','1st')]+df[('2015','2nd')]

For the subtotal rows (e.g., fruits total and animal total), I'm not sure where to begin.

enter image description here

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Jason O.
  • 3,168
  • 6
  • 33
  • 72
  • Can you be able to add a piece of code to generate the dataframe and reach upto the point where you need help? – Abbas Apr 24 '16 at 06:18

1 Answers1

3

It is very complicated, because you need create Multiindex in columns and index.

Create subtotals is easy - use groupby with sum. Then create Multiindex and last concat new columns to original DataFrame. Last you have to sort_index (I add Total_ before value for correct sorting):

print df
              2015_____     2016_______    
                    1st 2nd         1st 2nd
Fruits Apple         10   9          11  10
       Banana        20  22          21  20
Animal Lion           5   3           2   1
       Tiger          2   3           5   0

df1 = df.groupby(level=0, axis=1).sum()
print df1
               2015_____  2016_______
Fruits Apple          19           21
       Banana         42           41
Animal Lion            8            3
       Tiger           5            5

print df.columns.get_level_values(0).to_series().drop_duplicates().tolist()
['2015_____', '2016_______']

#change index to multiindex
new_columns = zip(df.columns.get_level_values(0).to_series().drop_duplicates().tolist(),
                  "Total_" + df1.columns.str[:4])
print new_columns
[('2015_____', 'Total_2015'), ('2016_______', 'Total_2016')]

df1.columns = pd.MultiIndex.from_tuples(new_columns)
print df1
               2015_____ 2016_______
              Total_2015  Total_2016
Fruits Apple          19          21
       Banana         42          41
Animal Lion            8           3
       Tiger           5           5

df = pd.concat([df,df1], axis=1)
df2 = df.groupby(level=0, sort=False).sum()
print df2
      2015_____     2016_______      2015_____ 2016_______
             1st 2nd         1st 2nd Total_2015  Total_2016
Animal         7   6           7   1         13           8
Fruits        30  31          32  30         61          62

print df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist()
['Animal', 'Fruits']

#change index to multiindex
new_idx=zip(df.index.levels[0][df.columns.labels[0]].to_series().drop_duplicates().tolist(),
            "Total_" + df2.index )
print new_idx
[('Animal', 'Total_Animal'), ('Fruits', 'Total_Fruits')]

df2.index = pd.MultiIndex.from_tuples(new_idx)
print df2
                    2015_____     2016_______      2015_____ 2016_______
                          1st 2nd         1st 2nd Total_2015  Total_2016
Animal Total_Animal         7   6           7   1         13           8
Fruits Total_Fruits        30  31          32  30         61          62

df = pd.concat([df,df2])
df = df.sort_index(axis=1).sort_index()
print df
                    2015_____                2016_______               
                          1st 2nd Total_2015         1st 2nd Total_2016
Animal Lion                 5   3          8           2   1          3
       Tiger                2   3          5           5   0          5
       Total_Animal         7   6         13           7   1          8
Fruits Apple               10   9         19          11  10         21
       Banana              20  22         42          21  20         41
       Total_Fruits        30  31         61          32  30         62
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks!!!! I think you didn't mean to include df.columns.labels[0]... in new_idx=zip(df.index.levels[0][df.columns.labels[0]], right? – Jason O. Apr 24 '16 at 16:07
  • Yes, I think you can use it too for columns. I have to use this method, because there was problem - values of list were swapped. Btw, [solution which use labels with levels too](http://stackoverflow.com/questions/14189695/reset-index-for-dataframe-columns). – jezrael Apr 24 '16 at 16:09
  • 1
    With some adjustments to my problem, this answer helped me a lot! – Clayton Tosatti Sep 11 '18 at 15:25