I'm new to R and want to implement lasso on my data in order to feature selection according to the coefficient estimated by this algorithm. My data base is big and There are 40 predictors(continuous and categorical).when I apply lasso regression using glmnet package, all the coefficients that are estimated for each predictor in this algorithm are zero except the intercept, why this happen? Is the model over fitted? How can I fix it?The code I used for this section is:
#Transforming categorical variables:
xfactors <- model.matrix(Bill_TotalCharge ~addNA(P_AgeGroup) +
addNA(ADT_ConditionOnDischarge) + addNA(Provider_Profession) +
addNA(ADT_HospitalName) + addNA(ADT_Province) + addNA(ADT_City) +
addNA(DiagnosisValueGroup) + addNA(DiagnosisGroupLevel1) +
addNA(DiagnosisGroupLevel2) + addNA(Bill_Insurer) + addNA(Bill_InsurerType1)
+ addNA(Bill_InsurerType2) + addNA(Bill_InsurerBox) +
addNA(ADT_AdmissionType) + addNA(Bill_RecordType) + addNA(P_MaritalStatus) +
addNA(Gender) + addNA(MonthNumberOfYear) + addNA(CalenderYear) ,
na.action=na.exclude)[,-1]
#Creating matrix of combination of contniuous and categorical varriables
x <- as.matrix(data.frame(Bill_TotalBasicInsurance, Bill_TotalPatient
,Bill_TotalCost1,Bill_TotalCost2, Bill_TotalCost3 , Bill_TotalCost4 ,
Bill_TotalCost5 , Bill_TotalCost6 , Bill_TotalCost7 , Bill_TotalCost8
,Bill_TotalCost9 ,Bill_TotalCost10 ,Bill_TotalCost11 ,Bill_TotalCost12 ,
P_Age, xfactors))
#Running lasso
glmmod <- glmnet(x, y=Bill_TotalCharge, family="gaussian",alpha=1)
Then I want to use cv.glmnet function to determine the min_lambda with cross validation and unbelievably it returns a 6_digits number as a min lambda(lambda and subsequently alpha should be between zero and one).What is the problem and how can I fix it?The code I used for this reason is:
cv.glmmod <- cv.glmnet(x, y=Bill_TotalCharge, alpha=1)
best.lambda <- cv.glmmod$lambda.min
I appreciate any help greatly in advance.