12

I would like to convert my dataframe into a matrix that expands a single factor column into multiple ones and assigns a 1/0 depending on the factor. For example

C1 C2 C3
A  3  5
B  3  4
A  1  1

Should turn into something like

C1_A C1_B C2 C3
1      0  3  5
0      1  3  4
1      0  1  1

How can I do this in R? I tried data.matrix, as.matrix which did not return what I wanted. They assign an "integer" value to a single factor column, there is no expansion.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
BBSysDyn
  • 4,389
  • 8
  • 48
  • 63

3 Answers3

16

Assuming dat is your data frame:

cbind(dat, model.matrix( ~ 0 + C1, dat))

  C1 C2 C3 C1A C1B
1  A  3  5   1   0
2  B  3  4   0   1
3  A  1  1   1   0

This solution works with any number of factor levels and without manually specifying column names.

If you want to exclude the column C1, you could use this command:

cbind(dat[-1], model.matrix( ~ 0 + C1, dat))
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • 10
    The OP seems to want `model.matrix(~.+0,dat)`. – Roland Dec 16 '12 at 13:47
  • @Roland Good idea +1. This would be even easier. – Sven Hohenstein Dec 16 '12 at 13:48
  • 1
    @Sven, this worked, thanks. It still keeps C1 in the result though (in addition to C1_A, C1_B columns), any idea how would I remove the original column? This is a more general question though (maybe), simply an easy R way of saying "give me all columns except _that_ one" would do. – BBSysDyn Dec 16 '12 at 15:33
  • @user423805 See the update of my answer. Or have a look at Roland's comment. – Sven Hohenstein Dec 16 '12 at 15:42
  • Ok, I just found this: dat <- dat[, setdiff(names(dat), c("C1")]. After conversion this snippet can be used to remove columns by name. Indexing can get tricky IMHO. – BBSysDyn Dec 16 '12 at 15:45
3

Let's call your data.frame df:

library(reshape2)
dcast(df,C2*C3~C1,fill=0,length)

  C2 C3 A B
1  1  1 1 0
2  3  4 0 1
3  3  5 1 0
Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    Thanks for both the answers.. isnt there a way to do this conversion without specifying any column names, such as C1? Simply .. convert(df) and it will handle factors. lm() as well as other regression methods do this internally right? – BBSysDyn Dec 16 '12 at 13:39
3
dat <- read.table(text =' C1 C2 C3
A  3  5
B  3  4
A  1  1',header=T)

Using transform

transform(dat,C1_A =ifelse(C1=='A',1,0),C1_B =ifelse(C1=='B',1,0))[,-1]
  C2 C3 C1_A C1_B
1  3  5    1    0
2  3  4    0    1
3  1  1    1    0

Or to get more flexbility , with within

within(dat,{ 
             C1_A =ifelse(C1=='A',1,0)
             C1_B =ifelse(C1=='B',1,0)})

  C1 C2 C3  C1_B C1_A
1  A  3  5    0    1
2  B  3  4    1    0
3  A  1  1    0    1
agstudy
  • 119,832
  • 17
  • 199
  • 261