I have a dataframe of forum posts extracted between 2017-11 and 2023-01, including the month the member joined (period), their age and a binary classifier. The data looks something like this:
period | binary_target | age |
---|---|---|
2018-04 | 0 | 2 |
2018-04 | 0 | 2 |
2018-04 | 1 | 4 |
2018-04 | 0 | 0 |
... | ... | ... |
2022-08 | 0 | 27 |
2022-08 | 0 | 11 |
2022-08 | 1 | 1 |
Is it possible to use the df.groupby() function to groupby period and age at the same time, in order to calculate the proportion of binary_target = 1 for each combination of period and age? Ideally, I would then store the results in a dataframe that looks something like the below:
period (age) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2017-11 | 0.29 | 0.33 | 0.35 | 0.35 | 0.35 | 0.30 | 0.34 | 0.33 | 0.33 | ... | ||||
2017-12 | 0.31 | 0.29 | 0.34 | 0.35 | 0.38 | 0.32 | 0.29 | 0.36 | 0.32 | ... | ||||
2018-01 | 0.32 | 0.29 | 0.36 | 0.32 | 0.37 | 0.33 | 0.34 | 0.41 | 0.37 | ... | ||||
2018-02 | 0.30 | 0.34 | 0.33 | 0.33 | 0.40 | 0.31 | 0.31 | 0.29 | 0.34 | ... | ||||
2018-03 | 0.30 | 0.34 | 0.33 | 0.33 | 0.40 | 0.33 | 0.34 | 0.35 | 0.38 | ... | ||||
2018-04 | 0.31 | 0.29 | 0.34 | 0.35 | 0.38 | 0.31 | 0.29 | 0.34 | 0.39 | ... | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |