58
def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, data[col3].values)
    p2 = plt.bar(ind, data[col2].values)

    plt.ylabel('Projects')
    plt.title('Number of projects aproved vs rejected')
    plt.xticks(ind, list(data[xtick].values))
    plt.legend((p1[0], p2[0]), ('total', 'accepted'))
    plt.show()

def univariate_barplots(data, col1, col2='project_is_approved', top=False):
    # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
    temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()

    # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

    temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

    temp.sort_values(by=['total'],inplace=True, ascending=False)

    if top:
        temp = temp[0:top]

    stack_plot(temp, xtick=col1, col2=col2, col3='total')
    print(temp.head(5))
    print("="*50)
    print(temp.tail(5))

univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

Error:

SpecificationError                        Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
      4 
      5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
      7     print (temp['total'].head(2))
      8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    251             # but not the class list / tuple itself.
    252             func = _maybe_mangle_lambdas(func)
--> 253             ret = self._aggregate_multiple_funcs(func)
    254             if relabeling:
    255                 ret.columns = columns

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: **nested renamer is not supported**
Yann
  • 2,426
  • 1
  • 16
  • 33
Akshay Jindal
  • 607
  • 1
  • 5
  • 4
  • 8
    As much as code-only answers are discouraged on StackOverflow, so is code-only posts. Please explain something of this process. – Parfait Feb 14 '20 at 15:48
  • 7
    You may also get this error if you try to aggregate and one or more of the columns are not present in the data frame – Confusion Matrix Jun 24 '20 at 06:06
  • @ConfusionMatrix wish I had seen this earlier - this was a very useful pointer with the not-so-intuitive error message. Thank you! – Anne Apr 13 '21 at 07:35

11 Answers11

72

In this specific case you can change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

to the new syntax

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

New syntax:

df.groupby(
    [columns]
).agg(
    new_column_name=("column", aggregation_function)
)

The reason for this is that the new pandas version named aggregation is the recommended replacement for the deprecated dict-of-dicts approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).

source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

Roelant
  • 4,508
  • 1
  • 32
  • 62
Kartikay Khanna
  • 852
  • 5
  • 4
60

This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:

In [190]: group = pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A')
In [195]: group.agg({'B': 'mean'})
Out[195]: 
   B
A   
1  2

In [196]: group.agg({'B': 'mean', 'non-existing-column': 'mean'})
...
SpecificationError: nested renamer is not supported

tsorn
  • 3,365
  • 1
  • 29
  • 48
  • 7
    This answer points to the actual source of the error. The other answer indicating there is another way to specify may be true but does not get to the root cause. – Mark Andersen Jul 02 '20 at 17:04
6

I found the way: Instead of going like

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{"maxQ":np.max,"minQ":np.min,"meanQ":np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

Do as follows:

g2 = df.groupby(["Description","CustomerID"],as_index=False).agg({'Quantity':{np.max,np.min,np.mean}})
g2.columns = ["Description","CustomerID","maxQ","minQ",'meanQ']

I had the same error and this is how I resolved it!

Arju Aman
  • 412
  • 1
  • 5
  • 15
  • Please note! The order of columns returned by group by is not the same as the order of aggregations included within 'agg'. If this is ignored, then all the numbers returned will be incorrect – Bagavathi Mar 20 '23 at 15:34
3

Do you get the same error if you change

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

to

temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']
kait
  • 1,327
  • 9
  • 13
3

Instead of using .agg({'total':'count'})), you can pass name with the function as a list of tuple like .agg([('total', 'count')])and use the same for Avg also. Hope it would work.

1

I have got the similar issue as @akshay jindal, but I check the documentation as suggested by @artikay Khanna, the problem solved, some functions has been adjusted, the old is deprecated. Here is the code warning provided per last time execute.

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.

Therefore, I will suggest try

grouper.agg(name_1=func_1, name_2=func_2)

Hope this will help

Yeo Keat
  • 143
  • 1
  • 9
1

Not a very elegant solution but this one works. As renaming the column is deprecated with the way you are doing. But there is work around. Create a temporary variable 'approved' , store the col2 in it. Because when you apply agg function , the original column values will change with column name. You can preserve the column name but then values in those column will change. So in order to preserve the original dataframe and to have two new columns with desired names, you can use the following code.

approved = temp[col2]
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg([('Avg','mean'),('total','count')]).reset_index())
temp[col2] = approved

P.S : Seems like an assignment of AAIC, I am working on same :)

Rahul Sonvane
  • 3,737
  • 7
  • 18
  • 21
1

Sometimes it's convenient to keep an aggdict of how each column should be transformed under aggregation that will work with different column sets and different group by columns. You can do this with the new syntax fairly easily by unpacking the dict with **. Here's a minimal working example for simple data.

dfx=pd.DataFrame(columns=["A","B","C"],data=np.random.randint(0,5,size=(10,3)))
#dfx
#
#   A  B  C
#0  4  4  1
#1  2  4  4
#2  1  3  3
#3  2  4  3
#4  1  2  1
#5  0  4  2
#6  2  3  4
#7  1  0  2
#8  2  1  4
#9  3  0  3

Maybe when you agg you want the first "A", the last "B", the mean "C" and sometimes your pipeline has a "D" (but not this time) that you also want the mean of.

aggdict = {"A":lambda x: x.iloc[0], "B": lambda x: x.iloc[-1], "C" : "mean" , "D":lambda x: "mean"}

You can build a simple dict like the old days and then unpack it with ** filtering on the relevant keys:

gb_col="C"
gbc = dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
#       A  B
#C      
#1  4  2
#2  0  0
#3  1  4
#4  2  3

And then you can slice and dice how you want with the same syntax:

mygb = lambda gb_col: dfx.groupby(gb_col).agg(**{k:(k,v) for k,v in aggdict.items() if k in dfx.columns and k != gb_col})
allgb = [mygb(c) for c in dfx.columns]
mmdanziger
  • 4,466
  • 2
  • 31
  • 47
0

I have tried alll the solutions and turned out to be the error with the name. If your column name has some inbuilt keywords such as "in", "is",etc., It is throwing error. In my case, My column name is "Points in Polygon" and I have resolved the issue by renaming the column to "Points"

Rishi
  • 66
  • 1
  • 9
0

@Rishi's solution worked for me. The original name of the column in my dataframe was net_value_budgeted_rate, which was essentially dollar value of the sale. I changed it to dollars and it worked.

0
Info = pd.DataFrame(df.groupby("school_state").agg(Approved=("project_is_approved",lambda x: x.eq(1).sum()),Total=("project_is_approved","count"),Avg=("project_is_approved","mean"))).reset_index().sort_values(by=["Total"],ascending=False).head()

You can break this into individual commands for better readability.

Jayakrishnan
  • 563
  • 5
  • 13