0

I am aggregating 10-20 cols with different functions on them. I want to provide different names to the cols that I'm aggregating. For eg. if this is my database:

A    B    C..   X    Y
1    a    22..  14   15
1    b    20..  37   1 
1    c    8..   18   10
1    d    10..  6    7
2    3e   13..  4    3
2    f    10..  12   15
3    g    0..   5    115

The standard deviation for col X grouped by cols A, B, C, should have field name 'stddev_k' and the mean for col Y grouped by the same cols should have field name 'randname' (The new fieldname cannot be derived from the old field name to which the function is being applied.)

Currently I've solved as follows:

  1. I run a groupby on one col at a time.
  2. I then rename the col
  3. I add the renamed col to dataframe D, which starts as an empty dataframe.

The above 3 run on a for-loop until all the cols I want to aggregate are added to dataframe D

This is a time consuming and lengthy process. I am looking for a faster way to solve this. I found this: Pandas Groupby - naming aggregate output column, which combines my step 2 and 3.

However this solution again works one col at a time and I will still need the time consuming for-loop. Is it possible to rename aggregated cols without doing it one column at a time?

Thanks

SModi
  • 125
  • 14
  • Look at cs95's solution which uses Named aggregations. This will allow you to apply different aggregation functions to different columns and specify a name for the resulting column. – ALollz Oct 02 '20 at 14:51
  • 1
    i.e. `df.groupby(['A', 'B', 'C']).agg(stddev_k=('X', 'std'), randname=('Y', 'mean'))` – ALollz Oct 02 '20 at 14:54

0 Answers0