I've been looking for the most current method to create a linear regression model given a Pandas Dataframe.
DF looks like:
+---------------------+-------------+--------------------+--------------------+
| Date | YearWeekNum | Dependent_Variable | Bonus_Grouping_Int |
+---------------------+-------------+--------------------+--------------------+
| 2017-07-01 00:12:07 | 2017-Wk26 | 35.4 | 1 |
| 2017-07-01 00:12:07 | 2017-Wk26 | 33.3 | 2 |
| 2018-01-05 25:12:07 | 2018-Wk0 | 28.2 | 1 |
| 2018-01-05 25:12:07 | 2018-Wk0 | 24.2 | 2 |
+---------------------+-------------+--------------------+--------------------+
I've created the YearWeekNum column with:
df['YearWeekNum'] = df['Date'].dt.strftime('%Y-Wk%U')
I'd love to be able to create a linear regression that uses the YearWeekNum
as the independent (predictor) variable and the Dependent Variable
as (you guessed it) the dependent (response) variable. In the end, a plot that looks like this:
I tried this question, by using result = sm.ols(formula="Dependent_Variable ~ YearWeekNum", data=df).fit()
, but it creates a model with each YearWeekNum as its own independent variable (doing a regression for each week period.
From this one, I also tried:
from pandas.stats.api import ols
but got:
ImportError: cannot import name 'ols'
It seems like the ols has been deprecated. So, my question is: How can I run a linear regression on a dataframe by Year and Week Number as the independent variable using Pandas?
Cherry on top: would be creating two regression models based on the grouping int (red line is values with Grouping int 1 and indigo line is values with Grouping int 2)
Thanks in advance!