18

Pandas Pivot Table Dictionary of Agg function

I am trying to calculate 3 aggregative functions during pivoting:

  1. Count
  2. Mean
  3. StDev

This is the code:

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc={'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
          .reset_index()
         )

Error I am getting is: KeyError: 'Mean'

How can I calculate those 3 functions?

Alexander
  • 105,104
  • 32
  • 201
  • 196
Felix
  • 1,539
  • 8
  • 20
  • 35

3 Answers3

25

As written in approved answer by @Happy001, aggfunc cant take dict is false. we can actually pass the dict to aggfunc.

A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select. for example:

import pandas as pd
import numpy as np

df = pd.read_excel('sales-funnel.xlsx')  #loading xlsx file

table = pd.pivot_table(df, index=['Manager', 'Status'], columns=['Product'], values=['Quantity','Price'],
           aggfunc={'Quantity':len,'Price':[np.sum, np.mean]},fill_value=0)
table

In the above code, I am passing dictionary to the aggfunc and performing len operation on Quantity and mean, sum operations on Price.

Here is the output attaching:

enter image description here

The example is taken from pivot table explained.

Ganesh_
  • 569
  • 7
  • 10
  • 1
    Great one @ganesh. However i noticed that one value can be mapped only to once aggfunc. If you attempt to map same value with other aggfunc, only the first mapping works. Have you come across this situation? – Sagar Dawda Apr 20 '18 at 10:17
8

The aggfunc argument of pivot_table takes a function or list of functions but not dict

aggfunc : function, default numpy.mean, or list of functions If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)

So try

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc=[len, np.mean, np.std])
          .reset_index()
         )

You may want to rename the hierarchical columns afterwards.

Happy001
  • 6,103
  • 2
  • 23
  • 16
4

Try using groupby

df = (Main_DF
      .groupby(['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], as_index=False)
      .LOT_VIRTUAL_LINE
      .agg({'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
     )

Setting as_index=False just leaves these as columns in your dataframe so you don't have to reset the index afterwards.

Alexander
  • 105,104
  • 32
  • 201
  • 196