3

I am trying to create pivot tables for different locations from the following dataframe (df):

Location Category Status Price
1 Furniture New $100
1 Furniture Old $50
2 Office Supplies New $200
1 Furniture New $100
1 Office Supplies New $300
1 Office Supplies Old $150

First, I have filtered the dataframe to separate for location 1 & 2 with code:

df1 = df[df['Location'] == 1]

df2 = df[df['Location'] == 2]

Next I have used the standard pandas pivot table function:

pd.pivot_table(df1, values='Price', index='Status', columns='Category', aggfunc=np.sum)

pd.pivot_table(df2, values='Price', index='Status', columns='Category', aggfunc=np.sum)

So I have the following two pivot tables as output:

Location 1:

Status Furniture Office Supplies
New $200 $300
Old $50 $150

Location 2:

Status Office Supplies
New $200

However, I want the pivot table for location 2 to include all possible categories and statuses and have 0 if they are not present. To summarize, I want the following pivot table for location 2:

Location 2:

Status Furniture Office Supplies
New $0 $200
Old $0 $0

I have gone through all the options of the pivot_table() function, but haven't found anything to solve this so far.

SeaBean
  • 22,547
  • 3
  • 13
  • 25

1 Answers1

5

You can create the pivot table before splitting by Location.

For pd.pivot_table:

  • set index with index=['Location', 'Status']
  • pass parameter dropna=False to allow all categories to show up for all Location and all Status even when empty. (Default is to hide empty entries).
  • pass parameter fill_value=0 to fill up NaN values as 0

Then locate Location from the pivot table by .loc, as follows:

df_out = pd.pivot_table(df, 
                        values='Price', 
                        index=['Location', 'Status'], 
                        columns='Category', 
                        aggfunc=np.sum, 
                        dropna=False, 
                        fill_value=0)

Result:

print(df_out)

Category         Furniture  Office Supplies
Location Status                            
1        New           200              300
         Old            50              150
2        New             0              200
         Old             0                0

Then, to get the pivot table only for Location 2, you can use .loc, as follows:

df2 = df_out.loc[2]

Output:

print(df2)

Category  Furniture  Office Supplies
Status                              
New               0              200
Old               0                0

Edit (for adding Total and Sub-total)

If you also want to include the Total (for all Location) and Sub-total (for each Location), you can also do it, as follows:

For pd.pivot_table:

  • pass parameter margins=True and margins_name='Total' to set up the margins for Total (grand total for all Location)
  • chain the command fillna(0, downcast='infer') after pd.pivot_table. This is to handle the glitch / bug of pd.pivot_table that the margin total will still show NaN for empty entry (e.g. Location=2 Status='Old' in this case) even when fill_value=0 parameter is specified.
df_out = pd.pivot_table(df, 
                        values='Price', 
                        index=['Location', 'Status'], 
                        columns='Category', 
                        aggfunc=np.sum, 
                        dropna=False, 
                        fill_value=0,
                        margins=True,
                        margins_name='Total'
                        ).fillna(0, downcast='infer')

Result

print(df_out)

Category         Furniture  Office Supplies  Total
Location Status                                   
1        New           200              300    500
         Old            50              150    200
2        New             0              200    200
         Old             0                0      0
Total                  250              650    900

Then, to add the Sub-total (for each Location), we further use:

(pd.concat([df_out, 
            df_out.query('Location != "Total"')
                  .groupby(level=0).sum()
                  .assign(Status='Sub-total')
                  .set_index('Status', append=True)])
   .sort_index())

Result:

Category            Furniture  Office Supplies  Total
Location Status                                      
1        New              200              300    500
         Old               50              150    200
         Sub-total        250              450    700
2        New                0              200    200
         Old                0                0      0
         Sub-total          0              200    200
Total                     250              650    900
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • This answer serves the primary purpose. Thanks a lot. However, the reason I filtered before applying the pivot_table() function was I wanted to include the total in margins. Applying the pivot table without filtering makes everything simpler, but now I don't have the totals for each location. Is there a way to do that? – Illusionista787 Aug 26 '21 at 19:45
  • 2
    @EkaraiGlass Oh, I see. You also set margins in the pivot table. For the subtotals, suggest you have a look at [this post](https://stackoverflow.com/questions/62605470/pandas-pivot-table-subtotals-with-multi-index) and may be [this post too](https://stackoverflow.com/questions/41979196/sub-total-in-pandas-pivot-table) – SeaBean Aug 26 '21 at 20:04
  • 1
    The first post solves the problem perfectly – Illusionista787 Aug 26 '21 at 20:20
  • 2
    @EkaraiGlass Yes, it's a very good solution. I'm also just going to edit my solution to include this part. – SeaBean Aug 26 '21 at 20:22
  • 2
    @EkaraiGlass I've also customized the codes for adding sub-totals for you based on the reference post I passed you. You can take a look. :-) – SeaBean Aug 26 '21 at 20:46
  • 1
    Thanks for such an elegant solution, @SeaBean – Illusionista787 Aug 31 '21 at 20:52