11

Assuming I have the following dataframe:

a       b       c      Sce1     Sce2    Sce3    Sce4    Sce5    Sc6
Animal  Ground  Dog    0.0      0.9     0.5     0.0     0.3     0.4  
Animal  Ground  Cat    0.6      0.5     0.3     0.5     1.0     0.2 
Animal  Air     Eagle  1.0      0.1     0.1     0.6     0.9     0.1 
Animal  Air     Owl    0.3      0.1     0.5     0.3     0.5     0.9     
Object  Metal   Car    0.3      0.3     0.8     0.6     0.5     0.6 
Object  Metal   Bike   0.5      0.1     0.4     0.7     0.4     0.2 
Object  Wood    Chair  0.9      0.6     0.1     0.9     0.2     0.8 
Object  Wood    Table  0.9      0.6     0.6     0.1     0.9     0.7 

I want to create a MultiIndex, which will contain the sum of each lvl. The output will look like this:

a      b      c     Sce1    Sce2    Sce3    Sce4    Sce5    Sce6
Animal              1.9     1.6     1.4     1.3     2.7     1.6 
       Ground       0.6     1.4     0.8     0.5     1.3     0.6 
              Dog   0.0     0.9     0.5     0.0     0.3     0.4 
              Cat   0.6     0.5     0.3     0.5     1.0     0.2 
       Air          1.3     0.2     0.7     0.8     1.4     1.0 
              Eagle 1.0     0.1     0.1     0.6     0.9     0.1 
              Owl   0.3     0.1     0.5     0.3     0.5     0.9 
Object              2.6     1.6     1.8     2.3     2.0     2.3 
       Metal        0.8     0.3     1.1     1.3     0.9     0.8 
              Car   0.3     0.3     0.8     0.6     0.5     0.6 
              Bike  0.5     0.1     0.4     0.7     0.4     0.2 
       Wood         1.8     1.3     0.6     1.0     1.1     1.5 
              Chair 0.9     0.6     0.1     0.9     0.2     0.8 
              Table 0.9     0.6     0.6     0.1     0.9     0.7 

At the moment I am using a loop to create three different dataframes on each level and then manipulate them on excel, as below. So I wanted to take this calculation in python if possible.

for i in range range(0,3):
    df = df.groupby(list(df.columns)[0:lvl], as_index=False).sum()
    return df

Many thanks in advance.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
pbou
  • 298
  • 3
  • 13
  • I think that there is no default way of doing it in pandas. Note that you can produce the groups including their sums with multilevel index with `df.groupby(['a','b','c']).sum()`. However, this is lacking the total per group. You should look at [this question](http://stackoverflow.com/questions/29413496/pandas-computing-total-sum-on-each-multiindex-sublevel) which is very similar to your problem. – Jan Trienes Apr 05 '17 at 18:54

2 Answers2

25

With some liberal use of MAGIC

pd.concat([
        df.assign(
            **{x: 'Total' for x in 'abc'[i:]}
        ).groupby(list('abc')).sum() for i in range(4)
    ]).sort_index()

                     Sce1  Sce2  Sce3  Sce4  Sce5  Sc6
a      b      c                                       
Animal Air    Eagle   1.0   0.1   0.1   0.6   0.9  0.1
              Owl     0.3   0.1   0.5   0.3   0.5  0.9
              Total   1.3   0.2   0.6   0.9   1.4  1.0
       Ground Cat     0.6   0.5   0.3   0.5   1.0  0.2
              Dog     0.0   0.9   0.5   0.0   0.3  0.4
              Total   0.6   1.4   0.8   0.5   1.3  0.6
       Total  Total   1.9   1.6   1.4   1.4   2.7  1.6
Object Metal  Bike    0.5   0.1   0.4   0.7   0.4  0.2
              Car     0.3   0.3   0.8   0.6   0.5  0.6
              Total   0.8   0.4   1.2   1.3   0.9  0.8
       Total  Total   2.6   1.6   1.9   2.3   2.0  2.3
       Wood   Chair   0.9   0.6   0.1   0.9   0.2  0.8
              Table   0.9   0.6   0.6   0.1   0.9  0.7
              Total   1.8   1.2   0.7   1.0   1.1  1.5
Total  Total  Total   4.5   3.2   3.3   3.7   4.7  3.9

I can get exactly what you asked for with

pd.concat([
        df.assign(
            **{x: '' for x in 'abc'[i:]}
        ).groupby(list('abc')).sum() for i in range(1, 4)
    ]).sort_index()

                     Sce1  Sce2  Sce3  Sce4  Sce5  Sc6
a      b      c                                       
Animal                1.9   1.6   1.4   1.4   2.7  1.6
       Air            1.3   0.2   0.6   0.9   1.4  1.0
              Eagle   1.0   0.1   0.1   0.6   0.9  0.1
              Owl     0.3   0.1   0.5   0.3   0.5  0.9
       Ground         0.6   1.4   0.8   0.5   1.3  0.6
              Cat     0.6   0.5   0.3   0.5   1.0  0.2
              Dog     0.0   0.9   0.5   0.0   0.3  0.4
Object                2.6   1.6   1.9   2.3   2.0  2.3
       Metal          0.8   0.4   1.2   1.3   0.9  0.8
              Bike    0.5   0.1   0.4   0.7   0.4  0.2
              Car     0.3   0.3   0.8   0.6   0.5  0.6
       Wood           1.8   1.2   0.7   1.0   1.1  1.5
              Chair   0.9   0.6   0.1   0.9   0.2  0.8
              Table   0.9   0.6   0.6   0.1   0.9  0.7

As for the how! I'll leave that as an exercise for the reader.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

you need to do two group by to get subtotals at every level of aggregation. Then add those back in to the initial DF. Here's a related question.

Community
  • 1
  • 1
JD Long
  • 59,675
  • 58
  • 202
  • 294