3

I've been experimenting with the R package 'biglasso' for high-dimensional data. However, the results I'm getting don't match the results I get for the LASSO functions from 'hdm' or 'glmnet. The documentation for biglasso is also really poor.

In the example below, the results from hdm and glmnet are very close but not exact, which is expected. However, biglasso doesn't drop the 'share' variable. I've tried all the different screen settings, and it doesn't make a difference. Any thoughts on how to get biglasso to be more consistent with the others? Thanks!

EDIT: for a given value of lambda, the results are highly similar. But each method seems to select a different lambda.. which for hdm makes sense, given that it's intended for causal inference and isn't concerned with out-of-sample prediction. hdm uses a different objective function from Belloni et al. (2012), but I'm not sure why cv.biglasso and cv.glmnet would differ so much. If I run biglasso without a screening rule, they should be maximizing the same objective function just with random diffs in the CV folds, no?

EDIT 2: I've edited the code below to include F. Privé's code to make glmnet use an algorithm similar to biglasso, and some additional code to make biglasso mimic glmnet.

##########
## PREP ##
##########

## Load required libraries
library(hdm)
library(biglasso)
library(glmnet)

## Read automobile dataset
data(BLP)
df <- BLP[[1]]

## Extract outcome
Y <- scale(df$mpg)

## Rescale variables
df$price <- scale(df$price)
df$mpd <- scale(df$mpd)
df$space <- scale(df$space)
df$hpwt <- scale(df$hpwt)
df$outshr <- scale(df$outshr)

## Limit to variables I want
df <- df[,names(df) %in% c("price","mpd","space","hpwt","share","outshr","air")]

## Convert to matrix
df.mat <- data.matrix(df)
df.bm <- as.big.matrix(df.mat)

#########
## HDM ##
#########

## Set seed for reproducibility
set.seed(1233)

## Run LASSO
fit.hdm <- rlasso(x=df.mat, y=Y, post=FALSE, intercept=TRUE)

## Check results
coef(fit.hdm)

############
## GLMNET ##
############

## Set seed for reproducibility
set.seed(1233)

## LASSO with 10-fold cross-validation
fit.glmnet <- cv.glmnet(df.mat, Y, alpha=1, family="gaussian")

## Check default results
coef(fit.glmnet)

## Try to mimic results of biglasso
coef(fit.glmnet, s = "lambda.min")

##############
## BIGLASSO ##
##############

## LASSO with 10-fold cross-validation
fit.bl <- cv.biglasso(df.bm, Y, penalty="lasso", eval.metric="default",
    family="gaussian", screen="None",
    seed=1233, nfolds=10)

## Check default results
coef(fit.bl) 

## Try to mimic results of glmnet
## Calculate threshold for CV error (minimum + 1 standard error)
thresh <- min(fit.bl$cve) + sd(fit.bl$cve)/sqrt(100)

## Identify highest lambda with CVE at or below threshold
max.lambda <- max(fit.bl$lambda[fit.bl$cve <= thresh])

## Check results for the given lambda
coef(fit.bl$fit)[,which(fit.bl$fit$lambda==max.lambda)]

kng229
  • 473
  • 5
  • 13
  • Is it just that they have different values for lambda? I don't see you set it here. – Noah Hammarlund Nov 24 '19 at 02:03
  • For biglasso and glmnet, shouldn't they be selecting the optimal lambda through the same cross-validation procedure? To my understanding, hdm follows a slightly different CV procedure but should be close! – kng229 Nov 24 '19 at 02:07
  • CV can be a very noisy procedure. Rather than testing this, consider testing the base fitting algorithm – Hong Ooi Nov 24 '19 at 02:07
  • Interesting, can you say more? Without the CV, biglasso and glmnet outputs a matrix of coefficients, by default 100 columns each for a different value of lambda. hdm just reports the result. What are you suggesting that I do with those? – kng229 Nov 24 '19 at 02:11
  • 1
    I don't think any of these do cv internally. They are just one instance of the algorithm. You have to set lambda to something different to the default value in R. For instance cv.glmnet does the cv for the glm package. Then you could choose a lamda from that procedure and use it on each package's lasso. https://stackoverflow.com/questions/29311323/difference-between-glmnet-and-cv-glmnet-in-r – Noah Hammarlund Nov 24 '19 at 03:27
  • cv.glmnet and cv.biglasso both do cross-validation, rlasso follows a different algorithm. Edited the question a bit, difference appears to be driven by each command's choice of a final lambda – kng229 Nov 24 '19 at 18:07
  • So now it's probably the decision rule for the final lambda or the details of the cv. I dont know which metric they use for each algorithm (rmse?). And some may use the maximum/minimum and some may use the 1 standard error rule. Also, you'd expect the lamda to be slightly different due to chance. – Noah Hammarlund Nov 24 '19 at 19:27
  • You still have to "choose" the lamda after these cv functions. Right now you are using the default way to choose: https://stats.stackexchange.com/questions/138569/why-is-lambda-within-one-standard-error-from-the-minimum-is-a-recommended-valu. So do something like fitglmnet$lambda.1se Then, take a given lamda and see if the non-cv versions of the functions are similar with that chosen lambda. – Noah Hammarlund Nov 24 '19 at 19:37

1 Answers1

2

There are basically two ways to choose the "best" lambda after CV:

  • The one that minimizes the CV error (default of {biglasso})

  • The one that is the most parsimonious (highest lambda) with the CV error lower than the minimum + 1 standard error (default of {glmnet}).

Try coef(fit.glmnet, s = "lambda.min") to use the minimum.

Also, to ensure reproducibility, try setting the CV folds instead of some seed. There are parameters foldid in glmnet() and cv.ind in biglasso().

F. Privé
  • 11,423
  • 2
  • 27
  • 78
  • 1
    This is great. I've edited my question to include some code to make the biglasso results mimic glmnet – kng229 Dec 14 '19 at 04:44