0

there is a data set I am working on which contains only multilevel factors as predictors and a binary response variable. This is currently a data frame. I want to run glmnet on the set so I need to build a model matrix (model.matrix). I read here All Levels of a Factor in a Model Matrix in R that a certain level per factor is taken as level. However, I don't know whether this is because in this case there are numerical and factor variables? In any case, can somebody roughly tell me how i would build a model matrix from the mtcars data set from MASS

Community
  • 1
  • 1
Hein
  • 175
  • 1
  • 3
  • 13

1 Answers1

1

This is a guess, since you have not described the function(s) you are using. My hunch is that you are using one of the "machine learning"--algorithms that require you to deliver separate respone vectors and predictor matrices. (If I'm wrong on this matter, then you definitely need to provide more details.)

Presuming you would use mpg as an "outcome" Y-variable, Using only the 2nd and third variables and constructing "dummy variables" only for the first this would be a model.matrix call that could build an appropriate X-object:

> model.matrix(~as.factor(cyl)+disp, mtcars[2:3])
                    (Intercept) as.factor(cyl)6 as.factor(cyl)8  disp
Mazda RX4                     1               1               0 160.0
Mazda RX4 Wag                 1               1               0 160.0
Datsun 710                    1               0               0 108.0
Hornet 4 Drive                1               1               0 258.0
Hornet Sportabout             1               0               1 360.0
Valiant                       1               1               0 225.0
Duster 360                    1               0               1 360.0
Merc 240D                     1               0               0 146.7
Merc 230                      1               0               0 140.8
Merc 280                      1               1               0 167.6
Merc 280C                     1               1               0 167.6
Merc 450SE                    1               0               1 275.8
##########Snipped remainder of output.

The formula object specifies the nature of model. The (Intercept)-term will be the shared bases level against which all factor variables are referenced.

IRTFM
  • 258,963
  • 21
  • 364
  • 487