-2

I have a similar question to this one

I have a dataframe in pandas that looks like this - showing ages at which different users won awards.

id awards age
1 100 24
1 150 26
1 50 54
2 193 34
2 209 50

Interested in computing total awards for age intervals i.e. 0 (0-8 years old), 1 (9 - 17 years old), 2 (18-26 years old), 3 (27-35 years old), 4 (26 - 44 years old) ... etc. Each person should have as many age intervals as necessary for the oldest person

How can I group them by id and by 9 year age intervals to get something like this:

id. total_awards age_interval
1 0 0
1 0 1
1 250 2
1 0 3
1 0 4
1 0 5
1 50 6
2 0 0
2 0 1
2 0 2
2 193 3
2 0 4
2 209 5
2 0 6
SeaBean
  • 22,547
  • 3
  • 13
  • 25

2 Answers2

2

You can define the bins and cuts as follows:

bins = [9 * i for i in range(0, df['age'].max() // 9 + 2)]
cuts = pd.cut(df['age'], bins, right=False)

print(cuts)

0    [18, 27)
1    [18, 27)
2    [54, 63)
3    [27, 36)
4    [45, 54)
Name: age, dtype: category
Categories (7, interval[int64, left]): [[0, 9) < [9, 18) < [18, 27) < [27, 36) < [36, 45) < [45, 54) < [54, 63)]

Then, group by id and the cuts and sum awards for the cuts to get total_awards. Create age_interval by GroupBy.cumcount()

df_out = (df.groupby(['id', cuts])
            .agg(total_awards=('awards', 'sum'))
            .reset_index(level=0)
            .reset_index(drop=True)
         )
df_out['age_interval'] = df_out.groupby('id').cumcount()

Result:

print(df_out)

    id  total_awards  age_interval
0    1             0             0
1    1             0             1
2    1           250             2
3    1             0             3
4    1             0             4
5    1             0             5
6    1            50             6
7    2             0             0
8    2             0             1
9    2             0             2
10   2           193             3
11   2             0             4
12   2           209             5
13   2             0             6
SeaBean
  • 22,547
  • 3
  • 13
  • 25
1

Pretty sure this covers what you are looking for

df = pd.read_clipboard()
bins = [i for i in range(0, 100 ,9)]
results = df.groupby(['id', pd.cut(df.age, bins)])['awards'].sum().reset_index()
print(results)
    id  age         awards
0   1   (0, 9]      NaN
1   1   (9, 18]     NaN
2   1   (18, 27]    250.0
3   1   (27, 36]    NaN
4   1   (36, 45]    NaN
5   1   (45, 54]    50.0
6   1   (54, 63]    NaN
7   1   (63, 72]    NaN
8   1   (72, 81]    NaN
9   1   (81, 90]    NaN
10  1   (90, 99]    NaN
11  2   (0, 9]      NaN
12  2   (9, 18]     NaN
13  2   (18, 27]    NaN
14  2   (27, 36]    193.0
15  2   (36, 45]    NaN
16  2   (45, 54]    209.0
17  2   (54, 63]    NaN
18  2   (63, 72]    NaN
19  2   (72, 81]    NaN
20  2   (81, 90]    NaN
21  2   (90, 99]    NaN
fthomson
  • 773
  • 3
  • 9