0

I'm new to R and want to implement lasso on my data in order to feature selection according to the coefficient estimated by this algorithm. My data base is big and There are 40 predictors(continuous and categorical).when I apply lasso regression using glmnet package, all the coefficients that are estimated for each predictor in this algorithm are zero except the intercept, why this happen? Is the model over fitted? How can I fix it?The code I used for this section is:

#Transforming categorical variables: 
xfactors <- model.matrix(Bill_TotalCharge ~addNA(P_AgeGroup) + 
addNA(ADT_ConditionOnDischarge) + addNA(Provider_Profession) + 
addNA(ADT_HospitalName) + addNA(ADT_Province) + addNA(ADT_City) + 
addNA(DiagnosisValueGroup) + addNA(DiagnosisGroupLevel1) + 
addNA(DiagnosisGroupLevel2) + addNA(Bill_Insurer) + addNA(Bill_InsurerType1) 
+ addNA(Bill_InsurerType2) + addNA(Bill_InsurerBox) + 
addNA(ADT_AdmissionType) + addNA(Bill_RecordType) + addNA(P_MaritalStatus) + 
addNA(Gender) + addNA(MonthNumberOfYear) + addNA(CalenderYear) , 
na.action=na.exclude)[,-1]

#Creating matrix of combination of contniuous and categorical varriables
x <- as.matrix(data.frame(Bill_TotalBasicInsurance, Bill_TotalPatient 
,Bill_TotalCost1,Bill_TotalCost2, Bill_TotalCost3 , Bill_TotalCost4 , 
Bill_TotalCost5 , Bill_TotalCost6 , Bill_TotalCost7 , Bill_TotalCost8 
,Bill_TotalCost9 ,Bill_TotalCost10 ,Bill_TotalCost11 ,Bill_TotalCost12 , 
P_Age, xfactors))

#Running lasso 
glmmod <- glmnet(x, y=Bill_TotalCharge, family="gaussian",alpha=1)

Then I want to use cv.glmnet function to determine the min_lambda with cross validation and unbelievably it returns a 6_digits number as a min lambda(lambda and subsequently alpha should be between zero and one).What is the problem and how can I fix it?The code I used for this reason is:

 cv.glmmod <- cv.glmnet(x, y=Bill_TotalCharge, alpha=1)
 best.lambda <- cv.glmmod$lambda.min

I appreciate any help greatly in advance.

far
  • 321
  • 2
  • 5
  • 11
  • following these guidelines will greatly increase your chances of getting a helpful answer: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jan Aug 26 '17 at 06:30
  • Your lasso and CV code looks fine to me, so I suspect it has something to do with your data that prevents `cv.glmnet` from converging to lambda with minimum loss. Please provide your `x` matrix and `y` variable so we can check for any issues. – acylam Aug 27 '17 at 03:42
  • @useR:I don't know how can I attach my x matrix as you can access it. would you please guide me? – far Aug 27 '17 at 08:54
  • No problem, just copy and paste the output of `dput(x)` into your question. That way we can reproduce your issue.You can also read this: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?noredirect=1&lq=1 – acylam Aug 28 '17 at 13:13
  • As my data is big, the result of running dput(x) code does not completely shown in console. – far Aug 29 '17 at 15:54

0 Answers0