1

A while ago, I asked a question about creating a categorical variable from mutually exclusive dummy variables. Now, it turns out I want to do the opposite.

How would one go about creating dummy variables in a long-form dataset from a single categorical variable (time)? e.g. the dataframe below...

id     time   
1      1       
1      2       
1      3      
1      4       

would become...

id     time    time_dummy_1   time_dummy_2    time_dummy_3  time_dummy_4
1      1       1              0               0             0
1      2       0              1               0             0
1      3       0              0               1             0
1      4       0              0               0             1

I'm sure this is trivial (and please let me know if this question is a duplicate -- I'm not sure it is, but will happily remove if so). Thanks!

Community
  • 1
  • 1
roody
  • 2,633
  • 5
  • 38
  • 50

2 Answers2

4

You can try the dummies library.

R Code:

# Creating the data frame
# id <- c(1,1,1,1)
# time <- c(1,2,3,4)
# data <- data.frame(id, time)

install.packages("dummies")
library(dummies)
data <- cbind(data, dummy(data$time))

Output:

  id time data1 data2 data3 data4
   1    1     1     0     0     0
   1    2     0     1     0     0
   1    3     0     0     1     0
   1    4     0     0     0     1

Further you can rename the newly added dummy variable headers to suit your needs

R Code:

# Rename column headers
colnames(data)[colnames(data)=="data1"] <- "time_dummy_1"
colnames(data)[colnames(data)=="data2"] <- "time_dummy_2"
colnames(data)[colnames(data)=="data3"] <- "time_dummy_3"
colnames(data)[colnames(data)=="data4"] <- "time_dummy_4"

Output:

  id time time_dummy_1 time_dummy_2 time_dummy_3 time_dummy_4
   1    1            1            0            0            0
   1    2            0            1            0            0
   1    3            0            0            1            0
   1    4            0            0            0            1

Hope this helps.

Manohar Swamynathan
  • 2,065
  • 21
  • 23
1

If your data is

id <- c(1,1,1,1)
time <- c(1,2,3,4)
df <- data.frame(id,time)

you can try

time <- as.character(time)
unique.time <- as.character(unique(df$time))
# Create a dichotomous dummy-variable for each time
x <- sapply(unique.time, function(x)as.numeric(df$time == x))

or

time.f = factor(time)
dummies = model.matrix(~time.f)
rmuc8
  • 2,869
  • 7
  • 27
  • 36
  • +1 for `model.matrix`! That's really neat. [This answer](http://stackoverflow.com/a/11952708/1446892) says `model.matrix` treats time=1 as the default or intercept value, but how do you "change how the "default" is chosen by messing with contrasts.arg in model.matrix"? And how do you assign the `dummy_time_1` column in the table? – Synergist Jun 03 '15 at 14:30
  • 1
    @Synergist ; you can suppress the intercept by adding`0` or `-1`. So `model.matrix(~0 + time.f)`. For more variables it become a little trickier to get all levelx of each factor .. see [here](http://stackoverflow.com/questions/4560459/all-levels-of-a-factor-in-a-model-matrix-in-r?/4569239#4569239) for a neat way – user20650 Jun 03 '15 at 17:56
  • 1
    @user20650 thank you! – Synergist Jun 03 '15 at 17:58