Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

Question

def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, data[col3].values)
    p2 = plt.bar(ind, data[col2].values)

    plt.ylabel('Projects')
    plt.title('Number of projects aproved vs rejected')
    plt.xticks(ind, list(data[xtick].values))
    plt.legend((p1[0], p2[0]), ('total', 'accepted'))
    plt.show()

def univariate_barplots(data, col1, col2='project_is_approved', top=False):
    # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
    temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()

    # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

    temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

    temp.sort_values(by=['total'],inplace=True, ascending=False)

    if top:
        temp = temp[0:top]

    stack_plot(temp, xtick=col1, col2=col2, col3='total')
    print(temp.head(5))
    print("="*50)
    print(temp.tail(5))

univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

Error:

SpecificationError                        Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
      4 
      5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
      7     print (temp['total'].head(2))
      8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    251             # but not the class list / tuple itself.
    252             func = _maybe_mangle_lambdas(func)
--> 253             ret = self._aggregate_multiple_funcs(func)
    254             if relabeling:
    255                 ret.columns = columns

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: **nested renamer is not supported**

As much as code-only answers are discouraged on StackOverflow, so is code-only posts. Please explain something of this process. — Parfait, Feb 14 '20 at 15:48
You may also get this error if you try to aggregate and one or more of the columns are not present in the data frame — Confusion Matrix, Jun 24 '20 at 06:06
@ConfusionMatrix wish I had seen this earlier - this was a very useful pointer with the not-so-intuitive error message. Thank you! — Anne, Apr 13 '21 at 07:35

score 72 · Accepted Answer · edited Jul 05 '23 at 15:44

In this specific case you can change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

to the new syntax

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

New syntax:

df.groupby(
    [columns]
).agg(
    new_column_name=("column", aggregation_function)
)

The reason for this is that the new pandas version named aggregation is the recommended replacement for the deprecated dict-of-dicts approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).

source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

how would you put multiple function inside the aggregate ? for example adding min and max too — Areza, Feb 06 '21 at 20:14
add them as keyword arguments, like `.agg(avg="mean", total="count")` — jkr, Feb 08 '22 at 03:51

tsorn · Answer 2 · 2020-06-06T07:17:33.227

60

This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:

In [190]: group = pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A')
In [195]: group.agg({'B': 'mean'})
Out[195]: 
   B
A   
1  2

In [196]: group.agg({'B': 'mean', 'non-existing-column': 'mean'})
...
SpecificationError: nested renamer is not supported

edited Jun 06 '20 at 07:17

answered Mar 16 '20 at 16:54

tsorn

3,365
1
29
48

7

This answer points to the actual source of the error. The other answer indicating there is another way to specify may be true but does not get to the root cause. – Mark Andersen Jul 02 '20 at 17:04

score 6 · Answer 3 · answered Nov 18 '20 at 19:59

6

I found the way: Instead of going like

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

Do as follows:

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{np.max,np.min,np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

I had the same error and this is how I resolved it!

answered Nov 18 '20 at 19:59

Arju Aman

412
1
5
15

Please note! The order of columns returned by group by is not the same as the order of aggregations included within 'agg'. If this is ignored, then all the numbers returned will be incorrect – Bagavathi Mar 20 '23 at 15:34

score 3 · Answer 4 · answered Feb 14 '20 at 17:02

Do you get the same error if you change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

to

temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']

Janitha Nawarathna · Answer 5 · 2020-06-21T13:29:17.030

3

Instead of using .agg({'total':'count'})), you can pass name with the function as a list of tuple like .agg([('total', 'count')])and use the same for Avg also. Hope it would work.

edited Jun 21 '20 at 13:29

answered Jun 21 '20 at 13:23

Janitha Nawarathna

299
3
11

This solution has the benefit of having the resulting columns named appropriately. – JoAnn Alvarez Dec 03 '20 at 06:00

score 1 · Answer 6 · answered Apr 01 '20 at 14:52

I have got the similar issue as @akshay jindal, but I check the documentation as suggested by @artikay Khanna, the problem solved, some functions has been adjusted, the old is deprecated. Here is the code warning provided per last time execute.

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.

Therefore, I will suggest try

grouper.agg(name_1=func_1, name_2=func_2)

Hope this will help

score 1 · Answer 7 · answered Jun 26 '20 at 11:10

Not a very elegant solution but this one works. As renaming the column is deprecated with the way you are doing. But there is work around. Create a temporary variable 'approved' , store the col2 in it. Because when you apply agg function , the original column values will change with column name. You can preserve the column name but then values in those column will change. So in order to preserve the original dataframe and to have two new columns with desired names, you can use the following code.

approved = temp[col2]
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg([('Avg','mean'),('total','count')]).reset_index())
temp[col2] = approved

P.S : Seems like an assignment of AAIC, I am working on same :)

score 1 · Answer 8 · answered Jul 01 '20 at 17:13

Sometimes it's convenient to keep an aggdict of how each column should be transformed under aggregation that will work with different column sets and different group by columns. You can do this with the new syntax fairly easily by unpacking the dict with **. Here's a minimal working example for simple data.

dfx=pd.DataFrame(columns=["A","B","C"],data=np.random.randint(0,5,size=(10,3)))
#dfx
#
#   A  B  C
#0  4  4  1
#1  2  4  4
#2  1  3  3
#3  2  4  3
#4  1  2  1
#5  0  4  2
#6  2  3  4
#7  1  0  2
#8  2  1  4
#9  3  0  3

Maybe when you agg you want the first "A", the last "B", the mean "C" and sometimes your pipeline has a "D" (but not this time) that you also want the mean of.

aggdict = {"A":lambda x: x.iloc[0], "B": lambda x: x.iloc[-1], "C" : "mean" , "D":lambda x: "mean"}

You can build a simple dict like the old days and then unpack it with ** filtering on the relevant keys:

gb_col="C"
gbc = dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
#       A  B
#C      
#1  4  2
#2  0  0
#3  1  4
#4  2  3

And then you can slice and dice how you want with the same syntax:

mygb = lambda gb_col: dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
allgb = [mygb(c) for c in dfx.columns]

score 0 · Answer 9 · answered Apr 10 '20 at 11:52

I have tried alll the solutions and turned out to be the error with the name. If your column name has some inbuilt keywords such as "in", "is",etc., It is throwing error. In my case, My column name is "Points in Polygon" and I have resolved the issue by renaming the column to "Points"

score 0 · Answer 10 · answered Jun 08 '20 at 19:26

0

@Rishi's solution worked for me. The original name of the column in my dataframe was net_value_budgeted_rate, which was essentially dollar value of the sale. I changed it to dollars and it worked.

answered Jun 08 '20 at 19:26

States.the.Obvious

163
3
11

score 0 · Answer 11 · answered Aug 24 '21 at 21:34

Info = pd.DataFrame(df.groupby("school_state").agg(Approved=("project_is_approved",lambda x: x.eq(1).sum()),Total=("project_is_approved","count"),Avg=("project_is_approved","mean"))).reset_index().sort_values(by=["Total"],ascending=False).head()

You can break this into individual commands for better readability.

Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

11 Answers11

Linked

Related