0

For my PhD I use a Lasso approach in R for variable selection. Now, I used the package glmnet and also hdm. What is the difference of the basic lasso estimator for logistic regression in these two packages? I read the docs and also googled a lot but the only hint that I found was this one which was not very helpful for my exact purpose.

The reason for asking is because my models converge if I use glmnet and they sometimes do not converge when I use hdm. That is why I assume that the difference is in the optimization function. Here is a minimal example:

# Delete environment
rm(list = ls())

# Packages
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-4
library(hdm)

# get data
data = read.table("https://pastebin.com/raw/gmXk0h2P", sep = ",", header = T)

# do the lasso
lasso_hdm = rlassologit(dep ~ ., data = data)
#> Warning: from glmnet C++ code (error code -1); Convergence for 1th lambda value
#> not reached after maxit=100000 iterations; solutions for larger lambdas returned
#> Warning in getcoef(fit, nvars, nx, vnames): an empty model has been returned;
#> probably a convergence issue
lasso_glm = glmnet(as.matrix(data[,!(names(data) %in% c("dep"))]), data$dep, family = "binomial")

Created on 2022-05-31 by the reprex package (v2.0.1)

Additionally, please find my sessionInfo:

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13   knitr_1.39        magrittr_2.0.3    R.cache_0.15.0   
#>  [5] rlang_1.0.2       fastmap_1.1.0     fansi_1.0.3       stringr_1.4.0    
#>  [9] styler_1.7.0      highr_0.9         tools_4.2.0       xfun_0.31        
#> [13] R.oo_1.24.0       utf8_1.2.2        cli_3.3.0         withr_2.5.0      
#> [17] htmltools_0.5.2   ellipsis_0.3.2    yaml_2.3.5        digest_0.6.29    
#> [21] tibble_3.1.7      lifecycle_1.0.1   crayon_1.5.1      purrr_0.3.4      
#> [25] R.utils_2.11.0    vctrs_0.4.1       fs_1.5.2          glue_1.6.2       
#> [29] evaluate_0.15     rmarkdown_2.14    reprex_2.0.1      stringi_1.7.6    
#> [33] compiler_4.2.0    pillar_1.7.0      R.methodsS3_1.8.1 pkgconfig_2.0.3

Created on 2022-05-31 by the reprex package (v2.0.1)

In the end I am interested in the theory of both packages and maybe I find a good reason to stick to the glmnet package as this converges.

Thank you so much in advance!

Irazall
  • 117
  • 9
  • 1
    This is a pretty challenging/deep question for Stack Overflow: someone will have to dig fairly deeply into the machinery and/or documentation in order to figure this out ... good luck ... FWIW the hdm vignette says that the package uses the "Shooting Lasso Algorithm (Fu, 1998)". Have you looked at that paper yet? – Ben Bolker May 29 '22 at 20:03
  • 1
    It would help if you can come up with a [mcve], possibly by simulating data that look like (but are not the same as) your data, so we can see an example that converges when fitted with one package but not with the other ...? – Ben Bolker May 29 '22 at 20:09
  • 1
    Better venue would be stats.stackechange.com or perhaps the Data Science forum. This is only on topic for SO if you can produce a specific example the displays the undesirable behavior. – IRTFM May 30 '22 at 04:35
  • I added an example and also posted the question on stats.stackexchange.com here: https://stats.stackexchange.com/questions/577177/r-what-is-the-difference-of-the-lasso-for-variable-selection-between-the-packag – Irazall May 30 '22 at 22:35
  • Please do **not** cross-post if possible -- pick one forum and go with it. – Ben Bolker May 30 '22 at 22:51

0 Answers0