22

I want to make a pivot table from the following dataframe with columns sales, rep. The pivot table shows sales but no rep. When I tried with only rep, I got the error DataError: No numeric types to aggregate. How to fix this such that I see both the numeric field sales and the field(string) rep

data = {'year': ['2016', '2016', '2015', '2014', '2013'],
        'country':['uk', 'usa', 'fr','fr','uk'],
        'sales': [10, 21, 20, 10,12],
        'rep': ['john', 'john', 'claire', 'kyle','kyle']
        }

print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep','sales'])

        sales               
year     2013 2014 2015 2016
country                     
fr        NaN   10   20  NaN
uk         12  NaN  NaN   10
usa       NaN  NaN  NaN   21


print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep'])
DataError: No numeric types to aggregate
dumbledad
  • 16,305
  • 23
  • 120
  • 273
DougKruger
  • 4,424
  • 14
  • 41
  • 62
  • 1
    It depends on what you are trying to do. The default agg function is 'mean' and you cannot take the mean of sales reps. Either change the agg function or pass another column for values. If you just want to use pivot, use pivot instead of pivot_table. – ayhan Aug 30 '16 at 13:36

2 Answers2

58

You could use set_index and unstack:

df = pd.DataFrame(data)
df.set_index(['year','country']).unstack('year')

yields

          rep                     sales                  
year     2013  2014    2015  2016  2013  2014  2015  2016
country                                                  
fr       None  kyle  claire  None   NaN  10.0  20.0   NaN
uk       kyle  None    None  john  12.0   NaN   NaN  10.0
usa      None  None    None  john   NaN   NaN   NaN  21.0

Or, using pivot_table with aggfunc='first':

df.pivot_table(index='country', columns='year', values=['rep','sales'], aggfunc='first')

yields

          rep                     sales                  
year     2013  2014    2015  2016  2013  2014  2015  2016
country                                                  
fr       None  kyle  claire  None  None    10    20  None
uk       kyle  None    None  john    12  None  None    10
usa      None  None    None  john  None  None  None    21

With aggfunc='first', each (country, year, rep) or (country, year, sales) group is aggregrated by taking the first value found. In your case there appears to be no duplicates, so the first value is the same as the only value.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
4

It seems that the problem comes from the different types for column rep and sales, if you convert the sales to str type and specify the aggfunc as sum, it works fine:

df.sales = df.sales.astype(str)

pd.pivot_table(df, index=['country'], columns=['year'], values=['rep', 'sales'], aggfunc='sum')

#        rep                            sales
#  year 2013    2014    2015    2016    2013    2014    2015    2016
# country                               
# fr    None    kyle    claire  None    None      10      20    None
# uk    kyle    None    None    john      12    None    None    10
#usa    None    None    None    john    None    None    None    21
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • The problem comes also if there no duplicate in the index columns, in this case `'country'`. If one of the values columns (`'rep'` and `'sales'`) are mixed types, the error will raise. – oshribr Dec 31 '18 at 09:06