Create mutually exclusive dummy variables from categorical variable in R

Question

A while ago, I asked a question about creating a categorical variable from mutually exclusive dummy variables. Now, it turns out I want to do the opposite.

How would one go about creating dummy variables in a long-form dataset from a single categorical variable (time)? e.g. the dataframe below...

id     time   
1      1       
1      2       
1      3      
1      4

would become...

id     time    time_dummy_1   time_dummy_2    time_dummy_3  time_dummy_4
1      1       1              0               0             0
1      2       0              1               0             0
1      3       0              0               1             0
1      4       0              0               0             1

I'm sure this is trivial (and please let me know if this question is a duplicate -- I'm not sure it is, but will happily remove if so). Thanks!

Can you give an example input/output? It's not entirely clear. — Synergist, Jun 03 '15 at 14:11
http://stackoverflow.com/questions/11952706/generate-a-dummy-variable-in-r — rmuc8, Jun 03 '15 at 14:16
if you use `library(tidyr)` and `library(plyr)`, it's simply: `df %>% mutate(time2=paste0("time_dummy_",time)) %>% spread(time2, id, fill=0)` — C8H10N4O2, Jun 03 '15 at 17:13

score 4 · Answer 1 · answered Jun 03 '15 at 14:23

You can try the dummies library.

R Code:

# Creating the data frame
# id <- c(1,1,1,1)
# time <- c(1,2,3,4)
# data <- data.frame(id, time)

install.packages("dummies")
library(dummies)
data <- cbind(data, dummy(data$time))

Output:

  id time data1 data2 data3 data4
   1    1     1     0     0     0
   1    2     0     1     0     0
   1    3     0     0     1     0
   1    4     0     0     0     1

Further you can rename the newly added dummy variable headers to suit your needs

R Code:

# Rename column headers
colnames(data)[colnames(data)=="data1"] <- "time_dummy_1"
colnames(data)[colnames(data)=="data2"] <- "time_dummy_2"
colnames(data)[colnames(data)=="data3"] <- "time_dummy_3"
colnames(data)[colnames(data)=="data4"] <- "time_dummy_4"

Output:

  id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
   1    1            1            0            0            0
   1    2            0            1            0            0
   1    3            0            0            1            0
   1    4            0            0            0            1

Hope this helps.

rmuc8 · Accepted Answer · 2015-06-03T14:20:52.317

1

If your data is

id <- c(1,1,1,1)
time <- c(1,2,3,4)
df <- data.frame(id,time)

you can try

time <- as.character(time)
unique.time <- as.character(unique(df$time))
# Create a dichotomous dummy-variable for each time
x <- sapply(unique.time, function(x)as.numeric(df$time == x))

or

time.f = factor(time)
dummies = model.matrix(~time.f)

edited Jun 03 '15 at 14:20

answered Jun 03 '15 at 14:15

rmuc8

2,869
7
27
36

+1 for `model.matrix`! That's really neat. [This answer](http://stackoverflow.com/a/11952708/1446892) says `model.matrix` treats time=1 as the default or intercept value, but how do you "change how the "default" is chosen by messing with contrasts.arg in model.matrix"? And how do you assign the `dummy_time_1` column in the table? – Synergist Jun 03 '15 at 14:30
1

@Synergist ; you can suppress the intercept by adding`0` or `-1`. So `model.matrix(~0 + time.f)`. For more variables it become a little trickier to get all levelx of each factor .. see [here](http://stackoverflow.com/questions/4560459/all-levels-of-a-factor-in-a-model-matrix-in-r?/4569239#4569239) for a neat way – user20650 Jun 03 '15 at 17:56
1

@user20650 thank you! – Synergist Jun 03 '15 at 17:58

Create mutually exclusive dummy variables from categorical variable in R

2 Answers2

R Code:

Output:

R Code:

Output:

Linked