3

How do I convert a factor in R to several indicator variables, one for each level?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Mohamed Aly
  • 81
  • 1
  • 3
  • 8
  • http://stackoverflow.com/questions/5048638/automatically-expanding-an-r-factor-into-a-collection-of-1-0-indicator-variables/5048726#5048726 – Ben Bolker Feb 17 '13 at 15:09

4 Answers4

8

One way is to use model.matrix():

model.matrix(~Species, iris)

    (Intercept) Speciesversicolor Speciesvirginica
1             1                 0                0
2             1                 0                0
3             1                 0                0

....

148           1                 0                1
149           1                 0                1
150           1                 0                1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$Species
[1] "contr.treatment"
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 1
    I think you have to add `-1` in the formula otherwise there will be a missing level in the resulting matrix. – juba Feb 17 '13 at 14:12
  • @juba That's a good point, but I think it depends on your objective. In dummy coding, you need `n-1` dummy variables to represent `n` variables. So, in the `iris$Species` example, levels of `0` and `0` means the species is `Setosa`. – Andrie Feb 17 '13 at 14:16
  • @Andrien you're right, it depends on the result you want to get, didn't think about this. – juba Feb 17 '13 at 14:17
  • @Andrie are you aware of some standard way to reverse this? I.e. get a factor variable from a given model.matrix? – Matt Bannert Oct 11 '15 at 09:35
  • How do you merge model.matrix output back into the original dataframe? – stackoverflowuser2010 May 21 '16 at 00:51
5

There are several ways to do it, but you can use model.matrix :

color <- factor(c("red","green","red","blue"))
data.frame(model.matrix(~color-1))
#   colorblue colorgreen colorred
# 1         0          0        1
# 2         0          1        0
# 3         0          0        1
# 4         1          0        0
juba
  • 47,631
  • 14
  • 113
  • 118
3

If I understood your question correctly, use model.matrix command, like this.

dd <- data.frame(a = gl(3,4), b = gl(4,1,12))
model.matrix(~ a + b, dd)
   (Intercept) a2 a3 b2 b3 b4
1            1  0  0  0  0  0
2            1  0  0  1  0  0
3            1  0  0  0  1  0
4            1  0  0  0  0  1
5            1  1  0  0  0  0
6            1  1  0  1  0  0
7            1  1  0  0  1  0
8            1  1  0  0  0  1
9            1  0  1  0  0  0
10           1  0  1  1  0  0
11           1  0  1  0  1  0
12           1  0  1  0  0  1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

attr(,"contrasts")$b
[1] "contr.treatment"
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
2

try this:

myfactors<-factor(sample(c("f1","f2","f3"),10,replace=T));
myIndicators<-diag(nlevels(myfactors))[myfactors,];
Aditya Sihag
  • 5,057
  • 4
  • 32
  • 43