Use a function instead of string in Pandas Groupby Agg

Asked Apr 17 '19 at 15:27

Active Apr 17 '19 at 15:38

Viewed 150 times

When aggregating data in Pandas I am able to return strings like "count", "sum", "mean", etc to aggregate data. Are there functions I can use instead of strings that would provide equivalent behavior. For example, if I try to use pd.Series.Count instead of count, the runtime takes a sizable hit.

import pandas as pd
import numpy as np

n = 10000000
df_nan = pd.DataFrame({"a": np.random.randint(0, 100, n*2),
                       "b": np.linspace(0, 100, n).tolist() + [None]*n})



%timeit df_nan.groupby("a").agg({"b": pd.Series.count})
1.63 s ± 28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df_nan.groupby("a").agg({"b": "count"})
479 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Any idea what function I could return instead?

asked Apr 17 '19 at 15:27

Max Kanter

2,006
6
16

This may help?https://stackoverflow.com/questions/38143717/groupby-in-python-pandas-fast-way – krewsayder Apr 17 '19 at 21:06

Use a function instead of string in Pandas Groupby Agg

0 Answers0