0

I'm looking to have subtotals on my pivot table. I have already seen that the problem was similar to the one exposed here : Calculating subtotals in pandas pivot_table with MultiIndex

At first, I have this and I wish to obtain subtotals of population, broken down by region ; every region is composed of "départements" (counties).

pop_table = df.pivot_table(index= ["region", "departement"], values= "pop99", aggfunc= "sum") :

>>> pop_table
                        pop99
    region departement
    AL     67           1026120
           68            708025
    AQ     24            388293
           33           1287334
           40            327334
    ...                     ...
    RA     38           1094006
           42            728524
           69           1578869
           73            373258
           74            631679

    [94 rows x 1 columns]

So I took this code to solve my problem but it yields me two columns :

    >>> region_total = df.groupby("region")["pop99"].sum()
    >>> region_total.index = pd.MultiIndex.from_arrays([region_total.index + "_Total", \
    len(region_total.index)*[""]])
    >>> pd.concat([region_total, pop_table]).sort_index()
                       0      pop99
    region
    AL       67        NaN  1026120.0
             68        NaN   708025.0
    AL_Total     1734145.0        NaN
    AQ       24        NaN   388293.0
             33        NaN  1287334.0
    ...                ...        ...
    RA       42        NaN   728524.0
             69        NaN  1578869.0
             73        NaN   373258.0
             74        NaN   631679.0
    RA_Total     5645399.0        NaN

    [115 rows x 2 columns]

And I don't know how to get only one column, nor do I understand why it returns two columns since my problem is quite the same as the one I've linked. The second index "departement" also disappeared.

I thank you by advance.

  • `region_total = df.groupby("region")["pop99"].sum().reset_index(name='pop99')`. – Quang Hoang Mar 19 '21 at 13:40
  • @QuangHoang sorry it doesn't work. I get an error "TypeError: '<' not supported between instances of 'tuple' and 'int' " when I do `pd.concat([region_total, pop_table]).sort_index()` because the index of pop_table is a MultiIndex ( so a tuple) since it is a pivot table with `index= ["region, 'departement]` whereas the index of region_total is now a classical index (type int) – Dorian Godefroy Mar 19 '21 at 14:59
  • My suggestion only replace your first line. You still need to change your region total index to multiIndex. – Quang Hoang Mar 19 '21 at 15:00
  • Ok, at first I didn't get what you meant but I figured it out. I had to `reset_index` with region and then the `MultiIndex.from_arrays()` just like in the previous post. I thank you so much. I still quite consider myself a rookie with pandas so I get confused with MultiIndex stuff and how to create / handle them but anyway, I need to get harder into this. I wish you a very nice day @QuangHoang ! – Dorian Godefroy Mar 19 '21 at 15:26

0 Answers0