Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Question

I'm trying to do some aggregations on a pandas data frame. Here is a sample code:

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]: 
      Amount      
         Sum Count
User              
user1   18.0     2
user2   20.5     3
user3   10.5     1

Which generates the following warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

How can I avoid this?

I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it? — Stephen McAteer, Nov 24 '17 at 05:31
To focus on the keywords of the solution rather than just the existing warning, I retitled ***"rename result columns from aggregation"*** and tagged. Now people might even find this question :) ahead of (say) the not-so-canonical[Naming returned columns in Pandas aggregate function?](https://stackoverflow.com/questions/19078325/naming-returned-columns-in-pandas-aggregate-function) — smci, May 08 '18 at 23:32
Hopefully this will be addressed in https://github.com/pandas-dev/pandas/issues/18366 — Nickolay, May 28 '18 at 08:06
How would this work if I don't do a "groupby" but I'm doing "pivot" instead? — avloss, Jun 11 '18 at 12:57
See pandas 0.25.0 [aggregation relabeling](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#groupby-aggregation-with-relabeling) — Scott Boston, Jul 30 '19 at 20:30

Ted Petrou · Accepted Answer · 2017-12-07T14:41:48.930

Use groupby `apply` and return a Series to rename columns

Use the groupby apply method to perform an aggregation that

Renames the columns
Allows for spaces in the names
Allows you to order the returned columns in any way you choose
Allows for interactions between columns
Returns a single level index and NOT a MultiIndex

To do this:

create a custom function that you pass to apply
This custom function is passed each group as a DataFrame
Return a Series
The index of the Series will be the new columns

Create fake data

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

create custom function that returns a Series
The variable x inside of my_agg is a DataFrame

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby apply method

df.groupby('User').apply(my_agg)

The big downside is that this function will be much slower than agg for the cythonized aggregations

Using a dictionary with groupby `agg` method

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN

But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality? — ErnestScribbler, Sep 25 '17 at 14:13
Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects. — mgmonteleone, Oct 09 '17 at 22:40
If you want to rename the columns you will have to do it manually. Either use a list to replace all of them `df.columns = ['your', 'new', 'col', 'names']` or use the `rename` method which will be a bit difficult since you have this results in a MultiIndex. — Ted Petrou, Oct 09 '17 at 22:44
@ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index. — Ted Petrou, Dec 07 '17 at 14:45
why don't you need to pass an argument to my_agg when you call it? — Snow, Sep 27 '18 at 11:20
I entered wrong column and got the same deprecated warning `df.groupby('User')['Amount'].agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})` — user3226167, Aug 20 '19 at 06:50

score 21 · Answer 2 · edited Jun 20 '20 at 09:12

21

Update for Pandas 0.25+ Aggregation relabeling

import pandas as pd

print(pd.__version__)
#0.25.0

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby("User")['Amount'].agg(Sum='sum', Count='count')

Output:

        Sum  Count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 30 '19 at 20:29

Scott Boston

147,308
15
139
187

5

But in this case, the names `Sum` and `Count` have to be valid Python names. You couldn't replace `Sum` with `Sum of foos`. :( – Dror Jan 10 '20 at 13:18
2

@Dror you can use this format `df.groupby('User').agg(**{'sum of foos':pd.NamedAgg('Amount','sum'), 'count of foos':pd.NamedAgg('Amount','count')})` – Scott Boston Jan 10 '20 at 17:25
Indeed, this works, but it unfortunately, it is so much more cumbersome when compared to the dictionary based aggregation API. – Dror Jan 13 '20 at 07:07
2

@Dror You don't need to include the `pd.NamedAgg` part which shortens the amount of code. This is now my preferred way of doing this. Pass in a dictionary with `**` in front of it. – Corey Levinson Dec 11 '20 at 23:02

score 17 · Answer 3 · answered Jun 05 '18 at 09:44

17

If you replace the internal dictionary with a list of tuples it gets rid of the warning message

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})

answered Jun 05 '18 at 09:44

Jacob Stevenson

3,718
1
13
10

Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message? – Peanut Jun 06 '18 at 12:42
@Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported. – Jacob Stevenson Jun 06 '18 at 16:30
8

This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future. – Ted Petrou Jun 13 '18 at 12:27
Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax. – Jacob Stevenson Jun 13 '18 at 16:35
Is this indeed unstable behavior as @TedPetrou mentioned? – Dror Jan 10 '20 at 13:19

Anna · Answer 4 · 2021-08-10T14:33:29.583

This worked for me, Pandas version 1.2.4

For each column we add a list which consists of tuples:

df.groupby('column to group by').agg(
{'column name': [('new column name', 'function to apply')]})

Example

# Create DataFrame
df=pd.DataFrame(data={'id':[1,1,2,3],'col1': [1,2,1,5], 'col2':[5,8,6,4]})

# Apply grouping 
grouped = df.groupby('id').agg({
                             'col1': [('name1', 'sum')], 
                             'col2': [('name2_mean', 'sum'), ('name2_custom_std', lambda x: np.std(x))]})

# Drop multi-index for columns and reset index
grouped.columns = grouped.columns.droplevel()
grouped.reset_index(inplace=True)

Result:

	id	name1	name2_mean	name2_custom_std
0	1	3	13	1.5
1	2	1	6	0.0
2	3	5	4	0.0

score 3 · Answer 5 · answered Nov 12 '18 at 14:46

This is what I did:

Create a fake dataset:

import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})
df

O/P:

    Amount  Score   User
0   10.0    9   user1
1   5.0 1   user2
2   8.0 8   user2
3   10.5    7   user3
4   7.5 7   user2
5   8.0 6   user1
6   9.0 9   user3

I first made the User the index, and then a groupby:

ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans

Solution:

    Sum Count
User        
user1   18.0    2
user2   20.5    3
user3   19.5    2

score 0 · Answer 6 · answered Mar 26 '19 at 12:27

Replace the inner dictionaries with a list of correctly named functions.

To rename the function I'm using this utility function:

def aliased_aggr(aggr, name):
    if isinstance(aggr,str):
        def f(data):
            return data.agg(aggr)
    else:
        def f(data):
            return aggr(data)
    f.__name__ = name
    return f

The group-by statement then becomes:


df.groupby(["User"]).agg({"Amount": [ 
    aliased_aggr("sum","Sum"),
    aliased_aggr("count","Count")
]

If you have bigger, reusable aggregation specs, you can convert them with

def convert_aggr_spec(aggr_spec):
    return {
        col : [ 
            aliased_aggr(aggr,alias) for alias, aggr in aggr_map.items() 
        ]  
        for col, aggr_map in aggr_spec.items() 
    }

So you can say

df.groupby(["User"]).agg(convert_aggr_spec({"Amount": {"Sum": "sum", "Count": "count"}}))

See also https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674

Ok, I got carried away with this: this is essentially recreating what already is in pandas, as illustrated in [this solution](https://stackoverflow.com/a/50697003/1419571) proposed by @jacob-stevenson — plankthom, Mar 26 '19 at 12:50
... but there are other circumstances where it is useful to carry the aliases in the aggregation spec items themselves. So i leave the answer as is. — plankthom, Mar 26 '19 at 12:57

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

6 Answers6

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method

Update for Pandas 0.25+ Aggregation relabeling

Linked

Related

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

6 Answers6

Use groupby apply and return a Series to rename columns

Using a dictionary with groupby agg method

Update for Pandas 0.25+ Aggregation relabeling

Linked

Related

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method