I have the following data as setup :
import pandas as pd
df = pd.DataFrame({
'index_1' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'index_2' : ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
'month' : ['jan', 'jan', 'feb', 'feb', 'jan', 'jan', 'feb', 'feb'],
'value_1' : range(0, 8),
'value_2' : range(8, 16)
})
print(df)
# index_1 index_2 month value_1 value_2
# 0 A A jan 0 8
# 1 A B jan 1 9
# 2 A C feb 2 10
# 3 A D feb 3 11
# 4 B A jan 4 12
# 5 B B jan 5 13
# 6 B C feb 6 14
# 7 B D feb 7 15
My expected output would look like this... (I did it by hand)
print(expected_output)
# jan feb
# month value_1 value_2 value_1 value_2
# index_1 index_2
# A A 0.0 8.0 NaN NaN
# B 1.0 9.0 NaN NaN
# C NaN NaN 2.0 10.0
# D NaN NaN 3.0 11.0
# B A 4.0 12.0 NaN NaN
# B 5.0 13.0 NaN NaN
# C NaN NaN 6.0 14.0
# D NaN NaN 7.0 15.0
There must be something I cannot wrap my mind around. I achieved the following, which is the good data, in a wrong format.
df = pd.pivot_table(
df,
index=['index_1', 'index_2'],
columns=['month'],
# The 2 following lines are implicit, and don't change the output.
# values=['value_1', 'value_2'],
# aggfunc='sum'
)
print(df)
# value_1 value_2
# month feb jan feb jan
# index_1 index_2
# A A NaN 0.0 NaN 8.0
# B NaN 1.0 NaN 9.0
# C 2.0 NaN 10.0 NaN
# D 3.0 NaN 11.0 NaN
# B A NaN 4.0 NaN 12.0
# B NaN 5.0 NaN 13.0
# C 6.0 NaN 14.0 NaN
# D 7.0 NaN 15.0 NaN
I also tried using some .groupby()
, along with .transpose()
, but I have a hard time correctly formatting this DataFrame. I have already read the following documentation pivot_table, reshaping dataframe and this cannonical by PiRSquared.