Pandas Groupby with bin sum aggregation

Question

I have a similar question to this one

I have a dataframe in pandas that looks like this - showing ages at which different users won awards.

id	awards	age
1	100	24
1	150	26
1	50	54
2	193	34
2	209	50

Interested in computing total awards for age intervals i.e. 0 (0-8 years old), 1 (9 - 17 years old), 2 (18-26 years old), 3 (27-35 years old), 4 (26 - 44 years old) ... etc. Each person should have as many age intervals as necessary for the oldest person

How can I group them by id and by 9 year age intervals to get something like this:

id.	total_awards	age_interval
1	0	0
1	0	1
1	250	2
1	0	3
1	0	4
1	0	5
1	50	6
2	0	0
2	0	1
2	0	2
2	193	3
2	0	4
2	209	5
2	0	6

Is there any issue with my answer you have to pick another answer which gives result not similar to your desired output ? Let me know and see whether I can fine-tune it. — SeaBean, Sep 10 '21 at 16:21

score 2 · Accepted Answer · answered Jul 26 '21 at 22:46

You can define the bins and cuts as follows:

bins = [9 * i for i in range(0, df['age'].max() // 9 + 2)]
cuts = pd.cut(df['age'], bins, right=False)

print(cuts)

0    [18, 27)
1    [18, 27)
2    [54, 63)
3    [27, 36)
4    [45, 54)
Name: age, dtype: category
Categories (7, interval[int64, left]): [[0, 9) < [9, 18) < [18, 27) < [27, 36) < [36, 45) < [45, 54) < [54, 63)]

Then, group by id and the cuts and sum awards for the cuts to get total_awards. Create age_interval by GroupBy.cumcount()

df_out = (df.groupby(['id', cuts])
            .agg(total_awards=('awards', 'sum'))
            .reset_index(level=0)
            .reset_index(drop=True)
         )
df_out['age_interval'] = df_out.groupby('id').cumcount()

Result:

print(df_out)

    id  total_awards  age_interval
0    1             0             0
1    1             0             1
2    1           250             2
3    1             0             3
4    1             0             4
5    1             0             5
6    1            50             6
7    2             0             0
8    2             0             1
9    2             0             2
10   2           193             3
11   2             0             4
12   2           209             5
13   2             0             6

score 1 · Answer 2 · answered Jul 26 '21 at 22:32

Pretty sure this covers what you are looking for

df = pd.read_clipboard()
bins = [i for i in range(0, 100 ,9)]
results = df.groupby(['id', pd.cut(df.age, bins)])['awards'].sum().reset_index()
print(results)

    id  age         awards
0   1   (0, 9]      NaN
1   1   (9, 18]     NaN
2   1   (18, 27]    250.0
3   1   (27, 36]    NaN
4   1   (36, 45]    NaN
5   1   (45, 54]    50.0
6   1   (54, 63]    NaN
7   1   (63, 72]    NaN
8   1   (72, 81]    NaN
9   1   (81, 90]    NaN
10  1   (90, 99]    NaN
11  2   (0, 9]      NaN
12  2   (9, 18]     NaN
13  2   (18, 27]    NaN
14  2   (27, 36]    193.0
15  2   (36, 45]    NaN
16  2   (45, 54]    209.0
17  2   (54, 63]    NaN
18  2   (63, 72]    NaN
19  2   (72, 81]    NaN
20  2   (81, 90]    NaN
21  2   (90, 99]    NaN

Pandas Groupby with bin sum aggregation

2 Answers2

Linked