Dataframe.groupby() based on two criteria

Question

I have a dataframe of forum posts extracted between 2017-11 and 2023-01, including the month the member joined (period), their age and a binary classifier. The data looks something like this:

period	binary_target	age
2018-04	0	2
2018-04	0	2
2018-04	1	4
2018-04	0	0
...	...	...
2022-08	0	27
2022-08	0	11
2022-08	1	1

Is it possible to use the df.groupby() function to groupby period and age at the same time, in order to calculate the proportion of binary_target = 1 for each combination of period and age? Ideally, I would then store the results in a dataframe that looks something like the below:

period (age)	1	2	3	4	5	6	7	8	9	...
2017-11	0.29	0.33	0.35	0.35	0.35	0.30	0.34	0.33	0.33	...
2017-12	0.31	0.29	0.34	0.35	0.38	0.32	0.29	0.36	0.32	...
2018-01	0.32	0.29	0.36	0.32	0.37	0.33	0.34	0.41	0.37	...
2018-02	0.30	0.34	0.33	0.33	0.40	0.31	0.31	0.29	0.34	...
2018-03	0.30	0.34	0.33	0.33	0.40	0.33	0.34	0.35	0.38	...
2018-04	0.31	0.29	0.34	0.35	0.38	0.31	0.29	0.34	0.39	...
...	...	...	...	...	...	...	...	...	...	...

`df.pivot_table(index='period', columns='age', values='binary_target', aggfunc='mean', fill_value=0)` — mozway, Mar 24 '23 at 16:40

score 1 · Answer 1 · answered Mar 24 '23 at 16:39

1

You want a pivot table.

df.pivot_table(index='period', columns='age', values='binary_target', aggfunc='mean')

Using mean as the aggregation function will give the proportion of 1s vs 0s

answered Mar 24 '23 at 16:39

James

32,991
4
47
70

Dataframe.groupby() based on two criteria

1 Answers1