0

I have a dataframe of forum posts extracted between 2017-11 and 2023-01, including the month the member joined (period), their age and a binary classifier. The data looks something like this:

period binary_target age
2018-04 0 2
2018-04 0 2
2018-04 1 4
2018-04 0 0
... ... ...
2022-08 0 27
2022-08 0 11
2022-08 1 1

Is it possible to use the df.groupby() function to groupby period and age at the same time, in order to calculate the proportion of binary_target = 1 for each combination of period and age? Ideally, I would then store the results in a dataframe that looks something like the below:

period (age) 1 2 3 4 5 6 7 8 9 ...
2017-11 0.29 0.33 0.35 0.35 0.35 0.30 0.34 0.33 0.33 ...
2017-12 0.31 0.29 0.34 0.35 0.38 0.32 0.29 0.36 0.32 ...
2018-01 0.32 0.29 0.36 0.32 0.37 0.33 0.34 0.41 0.37 ...
2018-02 0.30 0.34 0.33 0.33 0.40 0.31 0.31 0.29 0.34 ...
2018-03 0.30 0.34 0.33 0.33 0.40 0.33 0.34 0.35 0.38 ...
2018-04 0.31 0.29 0.34 0.35 0.38 0.31 0.29 0.34 0.39 ...
... ... ... ... ... ... ... ... ... ... ...
Connor95
  • 97
  • 4
  • `df.pivot_table(index='period', columns='age', values='binary_target', aggfunc='mean', fill_value=0)` – mozway Mar 24 '23 at 16:40

1 Answers1

1

You want a pivot table.

df.pivot_table(index='period', columns='age', values='binary_target', aggfunc='mean')

Using mean as the aggregation function will give the proportion of 1s vs 0s

James
  • 32,991
  • 4
  • 47
  • 70