What I have
I have a pandas frame like this:
df1 = pd.DataFrame({
'date': ['31-05-2017', '31-05-2017', '31-05-2017', '31-05-2017', '01-06-2017', '01-06-2017'],
'tag': ['A', 'B', 'B', 'B', 'A', 'A'],
'metric1': [0, 0, 0, 1, 1, 1],
'metric2': [0, 1, 1, 0, 1, 0]
})
df2 = pd.DataFrame({
'date': ['31-05-2017', '31-05-2017', '01-06-2017'],
'tag': ['A', 'B', 'A'],
'metric3': [25, 3, 7,]
})
What I want
1) I want to sum metric
and metric_2
per each combination of date
and tag
2) compute the percentage of entries being 1
in metric_2
3) merge grouped df1 with df2 so that I have metric_3
for each date
and tag
date | tag | metric1_sum | metric2_sum | metric2_percentage| metric 3
-----------|-----|-------------|-------------|-------------------|---------
31-05-2017 | A | 0 | 0 | 0 | 25
31-05-2017 | B | 1 | 2 | 0.667 | 3
01-06-2017 | A | 1 | 0 | 0.5 | 7
Attempts
(1) group and sum works
>>> g = df1.groupby(['date', 'tag']).agg(sum)
>>> g
metric1 metric2
date tag
01-06-2017 A 2 1
31-05-2017 A 0 0
B 1 2
(2) calculating the percentage works but adding it as a column does not
I used the method posted here to calculate the percentage.
>>> g2 = df1.groupby(['date', 'tag']).agg({'metric2': 'sum'})
>>> g2.groupby(level=0).apply(lambda x: x/float(x.sum()))
metric2
date tag
01-06-2017 A 1.0
31-05-2017 A 0.0
B 1.0
But, how can I now assign this grouped metric2
to a column metric2_percentage
in my groups g
or my df1
?
(3) unsuccessful merge
Merging with the group apparently does not work:
>>> pd.merge(g, df2, how='left', on=['date', 'tag'])
KeyError: 'date'
How can I then reduce df1
to one row per group so that I can merge it with df2
?