Collapse rows in Pandas dataframe with different logic per column

Question

I want to collapse dataframe rows that match values for a given column but the rest of the columns have to be collapsed with different logic. Example:

City           ColumnA   ColumnB
Seattle        20        30
Seattle        30        20
Portland       25        25
Portland       10        40

I want to collapse by City and I want ColumnA to keep the lowest value and ColumnB to keep the mean value, for instance. The result should look like:

City           ColumnA   ColumnB
Seattle        20        25
Portland       10        32.5

This is just an example, in my real problem I want to apply a more complex logic rather than min() or mean().

What is the right, cleanest and simplest way of doing this? Thank you.

This is all covered in the relevant section of the [docs](https://pandas.pydata.org/pandas-docs/stable/groupby.html). — DSM, Apr 27 '18 at 18:46

score 1 · Accepted Answer · answered Apr 27 '18 at 18:44

1

use groubpy and .agg:

df.groupby('City', as_index=False).agg({'ColumnA':'min', 'ColumnB':'mean'})

       City  ColumnA  ColumnB
0  Portland       10     32.5
1   Seattle       20     25.0

answered Apr 27 '18 at 18:44

sacuL

49,704
8
81
106

Thanks! what if the logic I want to apply is not min() or mean(), but something custom? – Didac Perez Parera Apr 27 '18 at 18:45
`.agg` can be a bit tricky, and it depends what you want to do, but take a look [here](https://stackoverflow.com/questions/26812763/applying-a-custom-groupby-aggregate-function-to-output-a-binary-outcome-in-panda), it might help – sacuL Apr 27 '18 at 18:55

Collapse rows in Pandas dataframe with different logic per column

1 Answers1