In pandas, do you have any ideas for speedups when combining groupby and apply and also using rolling with the min_periods argument?
Various people have mentioned speedup methods when combining groupby and apply. For example, this page describes a very fast way to calculate a weighted moving average. However, as far as I could find, I could not find any method that assumes the use of the min_periods argument.
My sample code is below, but if I run it as is, it takes more than 15 seconds in my environment.
from string import ascii_letters
import numpy as np
import pandas as pd
from numpy.random import choice
N = 15_000_000
np.random.seed(123)
letters = list(ascii_letters)
words = ["".join(choice(letters, 5)) for _ in range(30)]
df = pd.DataFrame({
"hoge": choice(words, N),
"fuga": choice(words, N),
"piyo": choice(words, N),
"metricA": np.random.rand(N),
"metricB": np.random.rand(N),})
# This code takes over 15 seconds in my env!
func = lambda group: group.shift(1).rolling(3, min_periods=1).mean()
df.groupby(['hoge', 'fuga', 'piyo'])[['metricA', 'metricB']].apply(func)