Answer to questions
- Difference between pivot and pivot_table
You already pick the right direction but there is a little difference:
pivot
will raise ValueError if index/columns pair is not unique. In your case,
both row in index 0 and index 4 share the same Year/Company pair "2006 ABC",
use pivot_table
as a generalization of pivot.
See note at the end of section reshaping for more details.
- Apply function to DataFrame
You cannot apply function in chain by concat them directly, by using
math.log(numpy.sum)
you instruct python to return a log of function object,
but what you want is a log of the result of that function object, that is difference.
Instead, the pivot table returned is a new DataFrame object, which is suitable
to apply your log function like the following way:
pseudo code:
df = read(input)
df2 = pivot(df)
df3 = df2['some_column_index'].apply(func1).apply(func2).[...apply(funcN)]
see Function application for more details.
Code example
A complete code like below:
import pandas as pd
import numpy as np
# read data from input
df = pd.read_csv('input.csv', sep=', ', index_col=0)
# pivot your input DataFrame and return a new one
df2 = df.pivot_table(index="Year", columns="Company", values="Number", aggfunc=np.sum)
# operate log function on df2
logyear = df2.apply(np.log10)
# update df2
df2.update(logyear)
# display
df2