When I summarize a dataframe and join it back on the original dataframe, then I'm having trouble working with the column names.
This is the original dataframe:
import pandas as pd
d = {'col1': ["a", "a", "b", "a", "b", "a"], 'col2': [0, 4, 3, -5, 3, 4]}
df = pd.DataFrame(data=d)
Now I calculate some statistics and merge the summary back in:
group_summary = df.groupby('col1', as_index = False).agg({'col2': ['mean', 'count']})
df = pd.merge(df, group_summary, on = 'col1')
The dataframe has some strange column names now:
df
Out:
col1 col2 (col2, mean) (col2, count)
0 a 0 0.75 4
1 a 4 0.75 4
2 a -5 0.75 4
3 a 4 0.75 4
4 b 3 3.00 2
5 b 3 3.00 2
I know I can use the columns like df.iloc[:, 2]
, but I would also like to use them like df['(col2, mean)']
, but this returns a KeyError
.
Source: This grew out of this previous question.