0

Working through this: https://medium.com/@wangyuw/data-reshaping-with-pandas-explained-80b2f51f88d2

Everything works, but the following line of code generates a warning:

agg = long_df.reset_index().groupby(['RegionVariable', 'EXP'])[features].agg({'count': len, 'mean': np.mean})

The warning it creates is:

FutureWarning: using a dict with renaming is deprecated and will be removed
in a future version.

For column-specific groupby renaming, use named aggregation

    >>> df.groupby(...).agg(name=('column', aggfunc))

  return super().aggregate(arg, *args, **kwargs)

I tried to 'fix' it with this:

agg = long_df.reset_index().groupby(['RegionVariable', 'EXP'])[features].agg(name=(('count', len), ('mean', np.mean)))

But I get this error:

KeyError: "Column '('count', <built-in function len>)' does not exist!"

How can len not exist in the second but it works in the first?

More to the point, what is the correct syntax to get this working without generating the deprecation warning?

Versions:

Python: 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit (Intel)]
NumPy: 1.18.1
Pandas: 0.25.3
MarkS
  • 1,455
  • 2
  • 21
  • 36

1 Answers1

0

For Reference, this was something deprecated in 2017 (https://github.com/pandas-dev/pandas/issues/18366)

There are multiple approaches (from this SO Answer)

You can change your code

agg = long_df.reset_index().groupby(['RegionVariable', 'EXP'])[features].agg(['count','mean'])

or use something like a custom pd.Series.apply function.

zglin
  • 2,891
  • 2
  • 15
  • 26