1

Some packages (ri, blockTools) require all columns of an input matrix to be numeric. I've been using the following hack, but I'm wondering if there is a more direct route: First, I run a regression that includes variables however I like them (factors, character, numeric, etc), then I extract the design matrix, which is in the right format (except for the intercept). How does lm make the design matrix? Can I make that matrix without going to the trouble of running the regression?

N <- 1000
gender <- sample(c("M", "F"), N, replace=TRUE)
age <- sample(18:65, N, replace = TRUE)
lincome <- rnorm(N, 10, 3)
party <- sample(c("D", "R", "I"), N, prob=c(.45, .35,.2), replace=TRUE)
education <- sample(10:20, N, replace=TRUE)

df <- data.frame(gender, age, lincome, party, education)
fit <- lm(1:N ~ gender + age + lincome + party + education, data=df)
mat <- model.matrix(fit)[,-1]

head(df)
head(mat)

This solution works OK, but feels hacky. Is there a better way?

Alex Coppock
  • 2,122
  • 3
  • 15
  • 31

1 Answers1

0

See https://stackoverflow.com/a/5048727/4530610

In your case it would be:

mat <- model.matrix( ~ gender + age + lincome + party + education, data=df)[,-1]
Community
  • 1
  • 1
bergant
  • 7,122
  • 1
  • 20
  • 24