0

I want to collapse dataframe rows that match values for a given column but the rest of the columns have to be collapsed with different logic. Example:

City           ColumnA   ColumnB
Seattle        20        30
Seattle        30        20
Portland       25        25
Portland       10        40

I want to collapse by City and I want ColumnA to keep the lowest value and ColumnB to keep the mean value, for instance. The result should look like:

City           ColumnA   ColumnB
Seattle        20        25
Portland       10        32.5

This is just an example, in my real problem I want to apply a more complex logic rather than min() or mean().

What is the right, cleanest and simplest way of doing this? Thank you.

Didac Perez Parera
  • 3,734
  • 3
  • 52
  • 87
  • This is all covered in the relevant section of the [docs](https://pandas.pydata.org/pandas-docs/stable/groupby.html). – DSM Apr 27 '18 at 18:46

1 Answers1

1

use groubpy and .agg:

df.groupby('City', as_index=False).agg({'ColumnA':'min', 'ColumnB':'mean'})

       City  ColumnA  ColumnB
0  Portland       10     32.5
1   Seattle       20     25.0
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Thanks! what if the logic I want to apply is not min() or mean(), but something custom? – Didac Perez Parera Apr 27 '18 at 18:45
  • `.agg` can be a bit tricky, and it depends what you want to do, but take a look [here](https://stackoverflow.com/questions/26812763/applying-a-custom-groupby-aggregate-function-to-output-a-binary-outcome-in-panda), it might help – sacuL Apr 27 '18 at 18:55