How can I combine groupby + condition + some operation, when using pandas?

Question

Given the following dataframe -

    id  value  is_mapping
0   1      1           1
1   1      1           1
2   1      1           0
3   2      1           0
4   2      1           0
5   2      1           0
6   3      1           1
7   3      1           1
8   3      1           1

I want to groupby id and sum the values of each id, and only those when is_mapping == 1

The result should be -

   id  sum_result
0   1           2
1   2           0
2   3           3

Is there a way to do that without apply?

score 1 · Answer 1 · answered Apr 03 '23 at 08:46

1

Making a reasonable assumption that the is_mapping column would only take 0/1 values :

(df['value'] * df['is_mapping']).groupby(df['id']).sum().reset_index(name='sum_result')
   id  sum_result
0   1           2
1   2           0
2   3           3

answered Apr 03 '23 at 08:46

wkgrcdsam

236
1
2
11

jezrael · Answer 2 · 2023-04-03T08:38:54.897

Use Series.where for set 0 for values not match conditiona and then aggregate sum:

df1 = (df['value'].where(df['is_mapping'].eq(1), 0)
                  .groupby(df['id'])
                  .sum()
                  .reset_index(name='sum_result'))
print (df1)
   id  sum_result
0   1           2
1   2           0
2   3           3

Or create helper column with same idea:

df1 = (df.assign(sum_result=df['value'].where(df['is_mapping'].eq(1), 0))
         .groupby('id', as_index=False)['sum_result']
         .sum())
print (df1)
   id  sum_result
0   1           2
1   2           0
2   3           3

Another solution with boolean indexing and Series.reindex for add not exist groups with 0 values:

df1 = (df[df['is_mapping'].eq(1)]
         .groupby('id')['value']
         .sum()
         .reindex(df['id'].unique(), fill_value=0)
         .reset_index(name='sum_result'))
print (df1)
   id  sum_result
0   1           2
1   2           0
2   3           3

PaulS · Answer 3 · 2023-04-03T09:18:24.023

A possible solution, using numpy:

a = df.values
id = np.unique(a[:,0])

pd.DataFrame({
    'id': id, 
    'sum_result': [np.sum(a[a[:,0] == x, 1] * a[a[:,0] == x,2]) for x in id]})

Another possible solution:

(df['id'].to_frame()
 .merge(df.loc[df['is_mapping'].astype(bool)]
        .groupby('id')['value'].sum(), on='id', how='left')
 .drop_duplicates().fillna(0).rename({'value': 'sum_result'}, axis=1))

Output:

   id  sum_result
0   1         2.0
3   2         0.0
6   3         3.0

How can I combine groupby + condition + some operation, when using pandas?

3 Answers3