Based on that topic I created a wrapper for statsmodels' glm
in order to use scikit's cross_val_score
. Now I need to introduce variance/analytic weights via var_weights parameter of SM.GLM.
class Wrapper(BaseEstimator, RegressorMixin):
def __init__(self, family, alpha, L1_wt, var_weights):
self.family = family
self.alpha = alpha
self.L1_wt = L1_wt
self.var_weights = var_weights
def fit(self, X, y):
self.model = sm.GLM(endog = y, exog = X, family=self.family, var_weights = self.var_weights)
self.result = self.model.fit_regularized(alpha=self.alpha, L1_wt=self.L1_wt)
return self.result
def predict(self, X):
return self.result.predict(X)
The wrapper let me successfully run:
sm_glm = Wrapper(family, alpha, L1_wt, var_weights)
sm_glm.fit()
But the cross validation
cross_val_score(sm_glm, x, y, cv, scoring)
doesn't work since cross_val_score trims (following cv folds) x and y, but not var_weights and that leads to an error:
ValueError: var weights not the same length as endog
The way I see it, I need to trek dynamically cross_val_score
iterations and trim var_weights
accordingly.
Any ideas how to create a workaround for that?