pandas: change table structue

Question

I want to change my table structure for plotting

Input:

	Year	Company	Number
0	2006	ABC	500
1	2007	ABC	700
2	2007	PQR	800
3	2006	PQR	600
4	2006	ABC	500

to output:

Year	ABC	PQR
2006	1000	600
2007	700	800

In addition to sum, i also want to take log of the sum. For eg in Year=2006, under column ABC, i want 3 (log1000). Similarly, for other columns as well.

I tried aggfunc=math.log(np.sum) in pivot_table but this is not working.

mozway · Accepted Answer · 2021-09-11T07:34:00.417

0

Use:

df.pivot_table(index='Year', columns='Company', values='Number',
               aggfunc=lambda x: np.log10(np.sum(x)))

output:

Company       ABC       PQR
Year                       
2006     3.000000  2.778151
2007     2.845098  2.903090

edited Sep 11 '21 at 07:34

answered Sep 11 '21 at 07:17

mozway

194,879
13
39
75

score 0 · Answer 2 · answered Sep 11 '21 at 07:20

Why not just:

>>> df.pivot_table('Number', *df.columns[:-1], aggfunc=sum)
Company   ABC  PQR
Year              
2006     1000  600
2007      700  800
>>>

Or:

>>> df.pivot_table('Number', 'Year', 'Company', aggfunc=sum)
Company   ABC  PQR
Year              
2006     1000  600
2007      700  800
>>>

score 0 · Answer 3 · answered Sep 11 '21 at 12:17

Answer to questions

Difference between pivot and pivot_table

You already pick the right direction but there is a little difference: pivot will raise ValueError if index/columns pair is not unique. In your case, both row in index 0 and index 4 share the same Year/Company pair "2006 ABC", use pivot_table as a generalization of pivot.

See note at the end of section reshaping for more details.

Apply function to DataFrame

You cannot apply function in chain by concat them directly, by using math.log(numpy.sum) you instruct python to return a log of function object, but what you want is a log of the result of that function object, that is difference.

Instead, the pivot table returned is a new DataFrame object, which is suitable to apply your log function like the following way:

pseudo code:

df = read(input)
df2 = pivot(df)
df3 = df2['some_column_index'].apply(func1).apply(func2).[...apply(funcN)]

see Function application for more details.

Code example

A complete code like below:

import pandas as pd
import numpy as np

# read data from input

df = pd.read_csv('input.csv', sep=', ', index_col=0)

# pivot your input DataFrame and return a new one

df2 = df.pivot_table(index="Year", columns="Company", values="Number", aggfunc=np.sum)

# operate log function on df2

logyear = df2.apply(np.log10)

# update df2

df2.update(logyear)

# display

df2

pandas: change table structue

3 Answers3

Answer to questions

Code example