I have a mixed data(both quantitative and categorical) predicting a quantitative variable. I have converted the categorical data into factors before feeding into glm model in R. My data has categorical variables with most of them having more than 150 levels. When I try to feed them to glm model, it fails with memory issues because of these factors having more levels. We can put a threshold and accept only the variables upto certain number of levels. But, I need to embed these factors which has more levels into the model. Is there any methodology to follow to address this issue.
Edit: The dataset has 120000 rows and 50 columns. When the data is expanded with model.matrix
there are 4772 columns.