1

In python you can use the dmatrices function from the patsy module when using categorical values for regressions to create 0 and 1 matrices for the categorical values.

Is there a library/function in R that performs the same function?

jeangelj
  • 4,338
  • 16
  • 54
  • 98
  • 1
    `?model.matrix`. This is probably a duplicate ... – Ben Bolker Jun 07 '17 at 23:09
  • e.g. https://stackoverflow.com/questions/5048638/automatically-expanding-an-r-factor-into-a-collection-of-1-0-indicator-variables/5048726#5048726 – Ben Bolker Jun 07 '17 at 23:10
  • 3
    Also worth pointing out that in most regression modelling functions in R you don't actually have to do this, since factors are automatically treated like this based on their `contrasts` attribute. – Marius Jun 07 '17 at 23:15
  • 2
    oh wow - so in R, I can just use the categorical columns in my dataframe as the coefficients and it will automatically use 0/1 matrices in my regression model? – jeangelj Jun 07 '17 at 23:20
  • 1
    that's right ... – Ben Bolker Jun 07 '17 at 23:31

1 Answers1

2

Suppose we have this data frame where columns x and y are numeric and column f is a factor. Then we can run the regression like this and lm will convert the formula to an appropriate model matrix including 0/1 columns and then run the regression on that:

# test data
set.seed(123)
DF <- transform(data.frame(f = gl(3, 5, labels = letters[1:3]), x = 1:15),
         y = rnorm(15, 1:15))

# run regression
fo <- y ~ x + f
lm(fo, DF)

The model matrix is computed in doing the above so there is no need to explicitly compute it but if you want to anyways try this:

# view model matrix
model.matrix(fo, DF)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341