I would like to group rows in a dataframe, given one column. Then I would like to receive an edited dataframe for which I can decide which aggregation function makes sense. The default should be just the value of the first entry in the group.
(it would be nice if the solution also worked for a combination of two columns)
Example
#!/usr/bin/env python
"""Test data frame grouping."""
# 3rd party modules
import pandas as pd
df = pd.DataFrame([{'id': 1, 'price': 123, 'name': 'anna', 'amount': 1},
{'id': 1, 'price': 7, 'name': 'anna', 'amount': 2},
{'id': 2, 'price': 42, 'name': 'bob', 'amount': 30},
{'id': 3, 'price': 1, 'name': 'charlie', 'amount': 10},
{'id': 3, 'price': 2, 'name': 'david', 'amount': 100}])
print(df)
gives the dataframe:
amount id name price
0 1 1 anna 123
1 2 1 anna 7
2 30 2 bob 42
3 10 3 charlie 1
4 100 3 david 2
And I would like to get:
amount id name price
3 1 anna 130
30 2 bob 42
110 3 charlie 3
So:
- Entries with the same value in the
id
column belong together. After that operation, there should still be anid
column, but it should have only unique values. - All values in
amount
andprice
which have the sameid
get summed up - For
name
, just the first one (by the current order of the dataframe) is taken.
Is this possible with Pandas?