How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Question

I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.

(it would be nice if the solution also worked for a combination of two columns)

Example

#!/usr/bin/env python

"""Test data frame grouping."""

# 3rd party modules
import pandas as pd


df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},
                   {'id': 1, 'price':   7, 'name': 'anna', 'amount': 2},
                   {'id': 2, 'price':  42, 'name': 'bob', 'amount': 30},
                   {'id': 3, 'price':   1, 'name': 'charlie', 'amount': 10},
                   {'id': 3, 'price':   2, 'name': 'david', 'amount': 100}])
print(df)

gives the dataframe:

   amount  id     name  price
0       1   1     anna    123
1       2   1     anna      7
2      30   2      bob     42
3      10   3  charlie      1
4     100   3    david      2

And I would like to get:

amount  id     name  price
     3   1     anna    130
    30   2      bob     42
   110   3  charlie      3

So:

Entries with the same value in the id column belong together. After that operation, there should still be an id column, but it should have only unique values.
All values in amount and price which have the same id get summed up
For name, just the first one (by the current order of the dataframe) is taken.

Is this possible with Pandas?

What's wrong with `df_new = df.groupby(df['id']).aggregate({'price': 'sum', 'name': 'first', 'amount': 'sum'})`? Does that not work for your use case? — cs95, Oct 19 '17 at 09:31
Hahaha, ok, I didn't try it. I just thought this is how a function should look like. Nice that it accidentially actually works. I'll edit my question and make that an answer. — Martin Thoma, Oct 19 '17 at 10:17

score 70 · Accepted Answer · answered Oct 19 '17 at 10:19

70

You are looking for

aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which gives

    price     name  amount
id                        
1     130     anna       3
2      42      bob      30
3       3  charlie     110

answered Oct 19 '17 at 10:19

Martin Thoma

124,992
159
614
958

9

Is there a published list of available aggregate functions that can be applied to a column? For example, how did you know that 'first' was a valid function? I've been googling for such a list. I have found a lot of articles and tutorials that mention *many* of the valid functions, but no complete listing. – Daniel Goldfarb Feb 03 '19 at 03:47
I didn't know that first was in there. I just guessed it :-) To me, pandas is super intuitive – Martin Thoma Jun 19 '19 at 05:13
@Martin--- Thanks! – Daniel Goldfarb Jun 19 '19 at 20:28
1

@DanielGoldfarb check out this https://cmdlinetips.com/2019/10/pandas-groupby-13-functions-to-aggregate/ – Fanglin Jun 05 '20 at 22:38
@lifelogger awesome! Thanks! – Daniel Goldfarb Jun 07 '20 at 03:00
5

The full list of available aggregation functions is documented here: https://pandas.pydata.org/docs/reference/groupby.html – Daniel Goldfarb Jun 08 '20 at 22:46

score 23 · Answer 2 · answered Oct 19 '17 at 10:30

23

For same columns ordering is necessary add reindex, because aggregate by dict:

d = {'price': 'sum', 'name': 'first', 'amount': 'sum'}
df_new = df.groupby('id', as_index=False).aggregate(d).reindex(columns=df.columns)
print (df_new)
   amount  id     name  price
0       3   1     anna    130
1      30   2      bob     42
2     110   3  charlie      3

answered Oct 19 '17 at 10:30

jezrael

822,522
95
1,334
1,252

1

I don't get what `as_index=False` does. Could you show me the difference? (+1 for reindex) – Martin Thoma Oct 19 '17 at 10:34
5

It is for not return index from column `id` like in your answer. – jezrael Oct 19 '17 at 10:36

How can I "merge" rows by same value in a column in Pandas with aggregation functions?

Example

2 Answers2

Linked