1

I have the following data that looks like this:

> View(mydata)   

   Gender   Race  Agegroup  Date       ..... #m columns
#1 Male   Asian     1      2015/04/20 .....
#2 Female  White    2      2015/04/15 .....
.
.
#n rows

I want to transform mydata into this format:

Gender=Male  Gender=Female  Race=Asian  Race=White   Agegroup = 1   Agegroup = 2 ......
    1             0              0             0              1               0
    0             1              0             1              0               1
    .             .              .             .              .               .
    .             .              .             .              .               .

I am new to R, I know for loop would work but is there a cleaner way to do this?

Guanhua Lee
  • 156
  • 1
  • 12
  • 1
    Please post few lines of your dataset and expected result based on that. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Apr 20 '15 at 14:52
  • 1
    This might be what you are looking for... http://stackoverflow.com/questions/11952706/generate-a-dummy-variable-in-r – cory Apr 20 '15 at 14:59

2 Answers2

3

You can use model.matrix to expand out multiple variables in a single call:

(d <- data.frame(Gender=c("Male", "Male", "Female", "Male"), Race=c("White", "Asian", "White", "Black"), AgeGroup=factor(c(1, 2, 2, 1))))
#   Gender  Race AgeGroup
# 1   Male White        1
# 2   Male Asian        2
# 3 Female White        2
# 4   Male Black        1

model.matrix(~.+0, data=d, contrasts.arg=sapply(d, contrasts, contrasts=F))
#   GenderFemale GenderMale RaceAsian RaceBlack RaceWhite AgeGroup1 AgeGroup2
# 1            0          1         0         0         1         1         0
# 2            0          1         1         0         0         0         1
# 3            1          0         0         0         1         0         1
# 4            0          1         0         1         0         1         0
# ...

The contrasts.args bit of the model.matrix call is code from here to ensure that all levels of all factors show up in your output.

Community
  • 1
  • 1
josliber
  • 43,891
  • 12
  • 98
  • 133
  • I noticed that names are factorized but integers are not. Is there a function to do both? – Guanhua Lee Apr 20 '15 at 15:52
  • @GuanhuaLee I've updated the code to include a variable that takes integer values. Note that you need to convert it to a factor for `model.matrix` to split it into separate columns. For instance, you could use `d$AgeGroup <- factor(d$AgeGroup)`. – josliber Apr 20 '15 at 16:33
2

You could use package reshape2:

DF <- data.frame(gender = c("m", "f", "m"),
                 agegroup = factor(c(1, 2, 2)))


library(reshape2)
dum <- lapply(names(DF), function(x, df) {
  d <- df[, x, drop = FALSE]
  d$id = seq_along(d[, 1])
  res <- dcast(d , id ~ ..., fun.aggregate = length)
  names(res)[-1] <- paste(names(d)[1], names(res)[-1], sep ="=")
  res
}, df = DF)


Reduce(merge, dum)
#  id gender=f gender=m agegroup=1 agegroup=2
#1  1        0        1          1          0
#2  2        1        0          0          1
#3  3        0        1          0          1
Roland
  • 127,288
  • 10
  • 191
  • 288