3

I am looking for a cleaner way to add subtotals to Pandas groupby.

Here is my DataFrame:

df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B'], 50),
'Sub-Category':np.random.choice( ['X','Y'], 50),
'Product':np.random.choice( ['Product 1','Product 2'], 50),
'Units_Sold':np.random.randint(1,100, size=(50)),
'Dollars_Sold':np.random.randint(100,1000, size=50),
'Date':np.random.choice( pd.date_range('1/1/2011','03/31/2011',  
                      freq='D'), 50, replace=False)})

From there, I create a new Groupby Dataframe as such:

df1 = df.groupby(['Category','Sub-Category','Product',pd.TimeGrouper(key='Date',freq='M')]).agg({'Units_Sold':'sum','Dollars_Sold':'sum'}).unstack().fillna(0)

I would like to provide sub-totals for both Category & Sub-Category. I can do this using this code:

df2 = df1.groupby(level=[0,1]).sum()
df2.index = pd.MultiIndex.from_arrays([df2.index.get_level_values(0),
                                   df2.index.get_level_values(1) + ' Total',
                                   len(df2) * ['']])
df3 = df1.groupby(level=[0]).sum()
df3.index = pd.MultiIndex.from_arrays([df3.index.get_level_values(0) + ' Total',
                                   len(df3) * [''],
                                   len(df3) * ['']])
pd.concat([df1,df2,df3]).sort_index()

This gives me the DataFrame I want: Final DataFrame

My question - is there a more pythonic way to do this than to have to create a new DataFrame for each level then concat together? I have researched this, but can not find a better way. I have to do this for many different MultiIndex dataframes & am seeking a better solution.

Thanks in advance for your help!

EDIT WITH ADDITIONAL INFORMATION:

Thank you to both @Wen & @DaFanat for their replies. I attempted to use the link @Wen provided on my data [link]:Python (Pandas) Add subtotal on each lvl of multiindex dataframe

pd.concat([df.assign(\
    **{x: 'Total' for x in "CategorySub-CategoryProduct"[i:]}\
    ).groupby(list('abc')).sum() for i in range(1,4)])\
    .sort_index()

This sums the total, however it ignores the dates that make up the second level of the columns. It leaves me with this outcome.Resulting Image

I've tried to add in a TimeGrouper with the groupby, but that returns an error. Any help would be greatly appreciated. Thanks!

keg5038
  • 341
  • 3
  • 13
  • Here is one .https://stackoverflow.com/questions/43238183/python-pandas-add-subtotal-on-each-lvl-of-multiindex-dataframe – BENY Sep 25 '17 at 15:13
  • Following the comment above - here is another one with many methods. https://stackoverflow.com/questions/15570099/pandas-pivot-tables-row-subtotals – yoav_aaa Sep 25 '17 at 15:26
  • Yw, you should thanks PiR :) – BENY Sep 25 '17 at 17:48
  • @Wen thank you for that link - much appreciated! I've tried to take the example in that link & make it work with my data & unfortunately I can't get it to work. I will update the above to show my progress. Thanks again! – keg5038 Sep 25 '17 at 17:53

1 Answers1

1

I can get you a lot closer by aligning your attempt above with the example from @piRSquared.

The list must match the MultiIndex. Try this instead:

iList = ['Category','Sub-Category','Product']
pd.concat([
    df1.assign(
        **{x: '' for x in iList[i:]}
    ).groupby(iList).sum() for i in range(1,4)
]).sort_index()

It doesn't present the word "Total" in the right place, nor are the totals at the bottom of each group, but at least it's more-or-less functionally correct. My totals won't match because the values in the DataFrame are random.

It took me a while to work through the original answer provided in Python (Pandas) Add subtotal on each lvl of multiindex dataframe. But the same logic applies here.

The assign() replaces the values in the columns with what is in the dict that is returned by the dict comprehension executed over the elements of the list of MultiIndex columns.
Then groupby() only finds unique values for those non-blanked-out columns and sums them accordingly.
These groupbys are enclosed in a list comprehension, so pd.concat() then just combines these sets of rows.
And sort_index() puts the index labels in ascending order.
(Yes, you still get a warning about "both a column name and an index level," but it still works.)

Table with subtotals

George Luft
  • 61
  • 1
  • 4