How do I convert a factor in R to several indicator variables, one for each level?
Asked
Active
Viewed 1.0k times
3
-
http://stackoverflow.com/questions/5048638/automatically-expanding-an-r-factor-into-a-collection-of-1-0-indicator-variables/5048726#5048726 – Ben Bolker Feb 17 '13 at 15:09
4 Answers
8
One way is to use model.matrix()
:
model.matrix(~Species, iris)
(Intercept) Speciesversicolor Speciesvirginica
1 1 0 0
2 1 0 0
3 1 0 0
....
148 1 0 1
149 1 0 1
150 1 0 1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$Species
[1] "contr.treatment"

Andrie
- 176,377
- 47
- 447
- 496
-
1I think you have to add `-1` in the formula otherwise there will be a missing level in the resulting matrix. – juba Feb 17 '13 at 14:12
-
@juba That's a good point, but I think it depends on your objective. In dummy coding, you need `n-1` dummy variables to represent `n` variables. So, in the `iris$Species` example, levels of `0` and `0` means the species is `Setosa`. – Andrie Feb 17 '13 at 14:16
-
@Andrien you're right, it depends on the result you want to get, didn't think about this. – juba Feb 17 '13 at 14:17
-
@Andrie are you aware of some standard way to reverse this? I.e. get a factor variable from a given model.matrix? – Matt Bannert Oct 11 '15 at 09:35
-
How do you merge model.matrix output back into the original dataframe? – stackoverflowuser2010 May 21 '16 at 00:51
5
There are several ways to do it, but you can use model.matrix
:
color <- factor(c("red","green","red","blue"))
data.frame(model.matrix(~color-1))
# colorblue colorgreen colorred
# 1 0 0 1
# 2 0 1 0
# 3 0 0 1
# 4 1 0 0

juba
- 47,631
- 14
- 113
- 118
3
If I understood your question correctly, use model.matrix
command, like this.
dd <- data.frame(a = gl(3,4), b = gl(4,1,12))
model.matrix(~ a + b, dd)
(Intercept) a2 a3 b2 b3 b4
1 1 0 0 0 0 0
2 1 0 0 1 0 0
3 1 0 0 0 1 0
4 1 0 0 0 0 1
5 1 1 0 0 0 0
6 1 1 0 1 0 0
7 1 1 0 0 1 0
8 1 1 0 0 0 1
9 1 0 1 0 0 0
10 1 0 1 1 0 0
11 1 0 1 0 1 0
12 1 0 1 0 0 1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"

MYaseen208
- 22,666
- 37
- 165
- 309
2
try this:
myfactors<-factor(sample(c("f1","f2","f3"),10,replace=T));
myIndicators<-diag(nlevels(myfactors))[myfactors,];

Aditya Sihag
- 5,057
- 4
- 32
- 43