I have a large dataset that I've been trying to run a lasso regression on. Categorical variables are re-coded to dummies. After receiving several messages regarding limited memory, I converted my data into a sparse matrix using Matrix.
The issue is that my code has been running for a long time (several hours without completion), and I'm not sure why.
Here is a sample of 2000 rows of data (~0.3% of data) that produces the same issue: https://drive.google.com/file/d/1ZhyFIoxJSRHrC_eIe58C5zXFKJW-13Lm/view?usp=sharing
This is the code I've been using:
library(tidyverse)
library(Matrix)
install.packages('glmnet')
library(glmnet)
pacman::p_load(methods,utils,foreach,shape,survival,Rcpp,RcppEigen)
data_sample_matrix = as.matrix(data_sample) %>% Matrix(.,sparse = TRUE)
set.seed(879)
split <- sample(nrow(data_sample_matrix), floor(0.8*nrow(data_sample_matrix)))
train <- data_sample_matrix[split,]
test <- data_sample_matrix[-split,]
train_s <- train[,-28]
test_s <- test[,-28]
cv_model = cv.glmnet(train_s, train[,28], alpha=1, family = "binomial", nlambda=10,
trace.it = TRUE)
Note: I've included all the packages supposed to be uploaded with glmnet per the CRAN because I noticed that they weren't being uploaded when I did library(glmnet).
Note: [,28] represents my outcome variable.
Can anyone point to what I'm doing wrong?