2

I want to change my table structure for plotting

Input:

Year Company Number
0 2006 ABC 500
1 2007 ABC 700
2 2007 PQR 800
3 2006 PQR 600
4 2006 ABC 500

to output:

Year ABC PQR
2006 1000 600
2007 700 800

In addition to sum, i also want to take log of the sum. For eg in Year=2006, under column ABC, i want 3 (log1000). Similarly, for other columns as well.

I tried aggfunc=math.log(np.sum) in pivot_table but this is not working.

hans
  • 31
  • 3

3 Answers3

0

Use:

df.pivot_table(index='Year', columns='Company', values='Number',
               aggfunc=lambda x: np.log10(np.sum(x)))

output:

Company       ABC       PQR
Year                       
2006     3.000000  2.778151
2007     2.845098  2.903090
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Why not just:

>>> df.pivot_table('Number', *df.columns[:-1], aggfunc=sum)
Company   ABC  PQR
Year              
2006     1000  600
2007      700  800
>>> 

Or:

>>> df.pivot_table('Number', 'Year', 'Company', aggfunc=sum)
Company   ABC  PQR
Year              
2006     1000  600
2007      700  800
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Answer to questions

  1. Difference between pivot and pivot_table

You already pick the right direction but there is a little difference: pivot will raise ValueError if index/columns pair is not unique. In your case, both row in index 0 and index 4 share the same Year/Company pair "2006 ABC", use pivot_table as a generalization of pivot.

See note at the end of section reshaping for more details.

  1. Apply function to DataFrame

You cannot apply function in chain by concat them directly, by using math.log(numpy.sum) you instruct python to return a log of function object, but what you want is a log of the result of that function object, that is difference.

Instead, the pivot table returned is a new DataFrame object, which is suitable to apply your log function like the following way:

pseudo code:

df = read(input)
df2 = pivot(df)
df3 = df2['some_column_index'].apply(func1).apply(func2).[...apply(funcN)]

see Function application for more details.

Code example

A complete code like below:

import pandas as pd
import numpy as np

# read data from input

df = pd.read_csv('input.csv', sep=', ', index_col=0)

# pivot your input DataFrame and return a new one

df2 = df.pivot_table(index="Year", columns="Company", values="Number", aggfunc=np.sum)

# operate log function on df2

logyear = df2.apply(np.log10)

# update df2

df2.update(logyear)

# display

df2
Allen Paul
  • 77
  • 3