On pandas version 0.19.2, I have the below dataframe with multiindex:
import pandas as pd
import numpy as np
arrays = [[2001, 2001, 2002, 2002, 2002, 2003, 2004, 2004],
['A', 'B', 'A', 'C', 'D', 'B', 'C', 'D']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index, name='signal')
Which looks like follows:
first second
2001 A -2.48
B 0.95
2002 A 0.55
C 0.65
D -1.32
2003 B -0.25
2004 C 0.86
D -0.31
I want to get a summary contingency dataframe where columns are unique "second" and indices are the "first" index, like below:
A B C D
2001 -2.48 0.95 NaN NaN
2002 0.55 NaN 0.65
2003 NaN -0.25 NaN NaN
2004 NaN NaN 0.86 -0.31
Any idea how this can be done? I played around with groupby()
as below but could not get anywhere
s.groupby(level=1).apply(lambda x: "to do")
Linked question: Python Pandas - how to do group by on a multiindex how-to-do-group-by-on-a-multiindex