Incorporating Prior Information in a Ridge Regression (RAPM) in R

Question

I am currently using R’s glmnet package to run a weighted ridge regression on hockey data. I have a sparse matrix with dummy variables denoting whether a player is on the ice playing offense or defense for a given shift, in addition to a few other predictors such as home ice advantage. I have a vector of weights which is the length of each shift. My target variable is a vector of shot rates that occur in a given shift.

The glmnet call is as follows:

glmnet(y = shot_rates, x = dummy_matrix, weights = shift_length, lambda = previously_obtained_lambda)

(The Lambda is obtained through cross validation on the same data set which is also done using glmnet.)

As of right now the distribution is entirely Gaussian and every predictor variable is biased towards a mean of zero. I am looking to incorporate prior information (prior means) for each dummy variable and possibly set separate lambda values for each of them but I am not sure how I go about doing that. I believe I can use penalty.factors to adjust the lambdas for each variable so we can put that aside for now and focus on the prior means.

I have looked into using the bayesglm package and implementing prior.means but my issues with it are two-fold: it is slow as it is and it does not accept sparse matrices which makes things significantly slower. For reference, my matrix of dummy variables contains roughly 600,000 rows and roughly 2,000 columns.

How might I go about efficiently incorporating prior means into my analysis? Thanks in advance for any suggestions.

Is there a reason that you want to use a ridge regression instead of building a fully Bayesian model? — sjp, Nov 26 '20 at 22:32
In all honesty, the way that I’ve learned this method is through ridge regression and I’ve found the code to be quite efficient. In theory I’m not against a fully Bayesian model but admittedly I’m not an expert on Bayesian statistics and more importantly my early experience with bayesglm was that it would be quite slow which is less than ideal for me since I’d like to use trial and error to refine my process. I’m open to suggestions though; what do you have in mind for a Bayesian model? — Topdownhockey, Nov 26 '20 at 22:35

sjp · Answer 1 · 2020-11-27T03:22:08.900

Okay, so based on the comment it seems like in principle a Bayesian approach is okay with you, and it's the only way I know to have regularizing priors not centred at 0. You also mentioned that speed was an issue, which is why I would recommend fitting a model using Stan, which is generally much faster than other Bayesian methods. Also, brms and Stan have absolutely wonderful documentation that is much more useful than what you normally find for other statistical packages in R.

brms is a very useful package that allows the fitting of Stan models in an lme4-like syntax.

In brms, priors can be specified for the intercept and each independent variable like this:

model_priors <- c(
  prior(normal(0, 5), class = "Intercept"),
  prior(normal(0, 1), class = "b")
)

This code puts a prior of a normal distribution with a mean of 0 with a sd of 5 on the incercept, as well as a prior with a mean of 0 and sd of 1 on each beta coefficient. If you want each beta coefficient to have different its own prior, you can specify it like this:

model_priors <- c(
  prior(normal(0, 5), class = "Intercept"),
  prior(normal(0.5, 1), class = "b", coef = "first_predictor"),
  prior(normal(-0.5, 1), class = "b", coef = "second_predictor")
)

This code changes sets specific priors for each of two hypothetical beta coefficients, notice how I have made it so they are no longer centred at 0.

You could then incorporate these priors into the model something like this

modelfit <- brm(
  formula = outcome_variable ~ first_predictor + second_predictor, 
  data = df,
  prior = model_priors,
)

Incorporating Prior Information in a Ridge Regression (RAPM) in R

1 Answers1