I have some time series data and I want to calculate a groupwise rolling regression of the last n days in Pandas and store the slope of that regression in a new column.
I searched the older questions and they either haven't been answered, or used Pandas OLS which I heard is deprecated.
I figured that I probably could use df.rolling.apply()
in combination with the scipy.stats.linregress
function, but I can't figure out a lambda function that does what I want to do.
Here is some sample code
import numpy as np
import pandas as pd
from scipy.stats import linregress
# make sample data
days = 21
groups = ['A', 'B', 'C']
data_days = list(range(days)) * len(groups)
values = np.random.rand(days*len(groups))
df = pd.DataFrame(data=zip(sorted(groups*days), data_days, values),
columns=['group', 'day', 'value'])
# calculate slope of regression of last 7 days
days_back = 7
grouped_data = df.groupby('group')
for g, data in grouped_data:
window = data.rolling(window=days_back,
min_periods=days_back)
I need a new column called 'slope' in which, from day 7 onward, the slope of a linear regression through the last 7 days is stored.