I have a number of multi-index columns each with a list of tuples that I want to flatten (the list, not the tuples) but I'm struggling with it. Here's what I have:
df = pd.DataFrame([[[(1,'a')],[(6,'b')],np.nan,np.nan],[[(5,'d'),(10,'e')],np.nan,np.nan,[(8,'c')]]])
df.columns = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])
>>> df
a b
0 1 0 1
0 [(1, a)] [(6, b)] NaN NaN
1 [(5, d), (10, e)] NaN NaN [(8, c)]
Desired result:
>>> df
a b
0 [(1, a), (6, b)] [NaN, NaN]
1 [(5, d), (10, e), NaN] [NaN, (8, c)]
How do I do this? From this related question, I tried the following:
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.array(list(x)).flatten())
a b
0 a b
1 a b
>>> df.stack(level=1).groupby(level=[0]).agg(lambda x: np.concatenate(list(x)))
...
Exception: Must produce aggregated value