Pandas: How to include all columns and all indexes for multiple pivot table

Question

I am trying to create pivot tables for different locations from the following dataframe (df):

Location	Category	Status	Price
1	Furniture	New	$100
1	Furniture	Old	$50
2	Office Supplies	New	$200
1	Furniture	New	$100
1	Office Supplies	New	$300
1	Office Supplies	Old	$150

First, I have filtered the dataframe to separate for location 1 & 2 with code:

df1 = df[df['Location'] == 1]

df2 = df[df['Location'] == 2]

Next I have used the standard pandas pivot table function:

pd.pivot_table(df1, values='Price', index='Status', columns='Category', aggfunc=np.sum)

pd.pivot_table(df2, values='Price', index='Status', columns='Category', aggfunc=np.sum)

So I have the following two pivot tables as output:

Location 1:

Status	Furniture	Office Supplies
New	$200	$300
Old	$50	$150

Location 2:

Status	Office Supplies
New	$200

However, I want the pivot table for location 2 to include all possible categories and statuses and have 0 if they are not present. To summarize, I want the following pivot table for location 2:

Location 2:

Status	Furniture	Office Supplies
New	$0	$200
Old	$0	$0

I have gone through all the options of the pivot_table() function, but haven't found anything to solve this so far.

SeaBean · Accepted Answer · 2021-08-26T21:07:12.460

You can create the pivot table before splitting by Location.

For pd.pivot_table:

set index with index=['Location', 'Status']
pass parameter dropna=False to allow all categories to show up for all Location and all Status even when empty. (Default is to hide empty entries).
pass parameter fill_value=0 to fill up NaN values as 0

Then locate Location from the pivot table by .loc, as follows:

df_out = pd.pivot_table(df, 
                        values='Price', 
                        index=['Location', 'Status'], 
                        columns='Category', 
                        aggfunc=np.sum, 
                        dropna=False, 
                        fill_value=0)

Result:

print(df_out)

Category         Furniture  Office Supplies
Location Status                            
1        New           200              300
         Old            50              150
2        New             0              200
         Old             0                0

Then, to get the pivot table only for Location 2, you can use .loc, as follows:

df2 = df_out.loc[2]

Output:

print(df2)

Category  Furniture  Office Supplies
Status                              
New               0              200
Old               0                0

Edit (for adding Total and Sub-total)

If you also want to include the Total (for all Location) and Sub-total (for each Location), you can also do it, as follows:

For pd.pivot_table:

pass parameter margins=True and margins_name='Total' to set up the margins for Total (grand total for all Location)
chain the command fillna(0, downcast='infer') after pd.pivot_table. This is to handle the glitch / bug of pd.pivot_table that the margin total will still show NaN for empty entry (e.g. Location=2 Status='Old' in this case) even when fill_value=0 parameter is specified.

df_out = pd.pivot_table(df, 
                        values='Price', 
                        index=['Location', 'Status'], 
                        columns='Category', 
                        aggfunc=np.sum, 
                        dropna=False, 
                        fill_value=0,
                        margins=True,
                        margins_name='Total'
                        ).fillna(0, downcast='infer')

Result

print(df_out)

Category         Furniture  Office Supplies  Total
Location Status                                   
1        New           200              300    500
         Old            50              150    200
2        New             0              200    200
         Old             0                0      0
Total                  250              650    900

Then, to add the Sub-total (for each Location), we further use:

(pd.concat([df_out, 
            df_out.query('Location != "Total"')
                  .groupby(level=0).sum()
                  .assign(Status='Sub-total')
                  .set_index('Status', append=True)])
   .sort_index())

Result:

Category            Furniture  Office Supplies  Total
Location Status                                      
1        New              200              300    500
         Old               50              150    200
         Sub-total        250              450    700
2        New                0              200    200
         Old                0                0      0
         Sub-total          0              200    200
Total                     250              650    900

This answer serves the primary purpose. Thanks a lot. However, the reason I filtered before applying the pivot_table() function was I wanted to include the total in margins. Applying the pivot table without filtering makes everything simpler, but now I don't have the totals for each location. Is there a way to do that? — Illusionista787, Aug 26 '21 at 19:45
@EkaraiGlass Oh, I see. You also set margins in the pivot table. For the subtotals, suggest you have a look at [this post](https://stackoverflow.com/questions/62605470/pandas-pivot-table-subtotals-with-multi-index) and may be [this post too](https://stackoverflow.com/questions/41979196/sub-total-in-pandas-pivot-table) — SeaBean, Aug 26 '21 at 20:04
@EkaraiGlass Yes, it's a very good solution. I'm also just going to edit my solution to include this part. — SeaBean, Aug 26 '21 at 20:22
@EkaraiGlass I've also customized the codes for adding sub-totals for you based on the reference post I passed you. You can take a look. :-) — SeaBean, Aug 26 '21 at 20:46

Pandas: How to include all columns and all indexes for multiple pivot table

1 Answers1

Edit (for adding Total and Sub-total)