Pandas: Creating aggregated column in DataFrame

Question

With the DataFrame below as an example,

In [83]:
df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
df
Out[83]:
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?

For example, if I sum values over items in A

In [84]:
df.groupby('A').sum()['values']
Out[84]:
A
1    25
2    45
Name: values

How can I get

   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Related: (1) https://stackoverflow.com/questions/45346986/pandas-group-by-count-and-add-count-to-original-dataframe (2) https://stackoverflow.com/questions/17432944/python-pandas-error-when-doing-groupby-counts — Anton Tarasenko, May 24 '19 at 12:05

score 49 · Accepted Answer · answered Nov 06 '12 at 19:07

In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})

In [21]: df
Out[21]:
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)

In [23]: df
Out[23]:
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

score 8 · Answer 2 · answered Nov 06 '12 at 18:36

I found a way using join:

In [101]:
aggregated = df.groupby('A').sum()['values']
aggregated.name = 'sum_values_A'
df.join(aggregated,on='A')

Out[101]:
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Anyone has a simpler way to do it?

score 4 · Answer 3 · answered Nov 06 '12 at 18:49

4

This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:

gb = df.groupby('A').sum()['values']

def getvalue(x):
    return gb[x]

df['sum'] = df['A'].map(getvalue)
df

answered Nov 06 '12 at 18:49

joaquin

82,968
29
138
152

Thanks, the map method seems pretty powerful. Will certainly use it often. – foglerit Nov 06 '12 at 22:04

score 3 · Answer 4 · answered Nov 06 '12 at 21:26

In [15]: def sum_col(df, col, new_col):
   ....:     df[new_col] = df[col].sum()
   ....:     return df

In [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')
Out[16]: 
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Pandas: Creating aggregated column in DataFrame

4 Answers4

Linked

Related