I've been experimenting with the R package 'biglasso' for high-dimensional data. However, the results I'm getting don't match the results I get for the LASSO functions from 'hdm' or 'glmnet. The documentation for biglasso is also really poor.
In the example below, the results from hdm and glmnet are very close but not exact, which is expected. However, biglasso doesn't drop the 'share' variable. I've tried all the different screen settings, and it doesn't make a difference. Any thoughts on how to get biglasso to be more consistent with the others? Thanks!
EDIT: for a given value of lambda, the results are highly similar. But each method seems to select a different lambda.. which for hdm makes sense, given that it's intended for causal inference and isn't concerned with out-of-sample prediction. hdm uses a different objective function from Belloni et al. (2012), but I'm not sure why cv.biglasso and cv.glmnet would differ so much. If I run biglasso without a screening rule, they should be maximizing the same objective function just with random diffs in the CV folds, no?
EDIT 2: I've edited the code below to include F. Privé's code to make glmnet use an algorithm similar to biglasso, and some additional code to make biglasso mimic glmnet.
##########
## PREP ##
##########
## Load required libraries
library(hdm)
library(biglasso)
library(glmnet)
## Read automobile dataset
data(BLP)
df <- BLP[[1]]
## Extract outcome
Y <- scale(df$mpg)
## Rescale variables
df$price <- scale(df$price)
df$mpd <- scale(df$mpd)
df$space <- scale(df$space)
df$hpwt <- scale(df$hpwt)
df$outshr <- scale(df$outshr)
## Limit to variables I want
df <- df[,names(df) %in% c("price","mpd","space","hpwt","share","outshr","air")]
## Convert to matrix
df.mat <- data.matrix(df)
df.bm <- as.big.matrix(df.mat)
#########
## HDM ##
#########
## Set seed for reproducibility
set.seed(1233)
## Run LASSO
fit.hdm <- rlasso(x=df.mat, y=Y, post=FALSE, intercept=TRUE)
## Check results
coef(fit.hdm)
############
## GLMNET ##
############
## Set seed for reproducibility
set.seed(1233)
## LASSO with 10-fold cross-validation
fit.glmnet <- cv.glmnet(df.mat, Y, alpha=1, family="gaussian")
## Check default results
coef(fit.glmnet)
## Try to mimic results of biglasso
coef(fit.glmnet, s = "lambda.min")
##############
## BIGLASSO ##
##############
## LASSO with 10-fold cross-validation
fit.bl <- cv.biglasso(df.bm, Y, penalty="lasso", eval.metric="default",
family="gaussian", screen="None",
seed=1233, nfolds=10)
## Check default results
coef(fit.bl)
## Try to mimic results of glmnet
## Calculate threshold for CV error (minimum + 1 standard error)
thresh <- min(fit.bl$cve) + sd(fit.bl$cve)/sqrt(100)
## Identify highest lambda with CVE at or below threshold
max.lambda <- max(fit.bl$lambda[fit.bl$cve <= thresh])
## Check results for the given lambda
coef(fit.bl$fit)[,which(fit.bl$fit$lambda==max.lambda)]