1

I have a survey dataset that includes self-reported ethnicity. Participants were allowed to select as many ethnicities as they wanted to. The data structure looks like this:

Hispanic English Indian

1        NA      NA     

NA       1       NA     

NA       NA      1  

NA       1       1

1        1       1   

What I want to do is create a new categorical ethnicity variable where the column names take the place of the 1s above. In addition, if someone selected more than one ethnicity, then the categorical ethnicity variable should include both, like this:

Hispanic English Indian Ethnicity

1        NA      NA     Hispanic

NA       1       NA     English

NA       NA      1      Indian

NA       1       1      English_Indian

1        1       1      Hispanic_English_Indian

1 Answers1

1

We can use apply to loop over the rows (MARGIN = 1), then paste the names of the row values that are not an NA

df1$Ethnicity <- apply(df1, 1, function(x) 
     paste(names(x)[!is.na(x)], collapse= "_"))

-output

 df1
  Hispanic English Indian               Ethnicity
1        1      NA     NA                Hispanic
2       NA       1     NA                 English
3       NA      NA      1                  Indian
4       NA       1      1          English_Indian
5        1       1      1 Hispanic_English_Indian

data

df1 <- structure(list(Hispanic = c(1L, NA, NA, NA, 1L), 
English = c(NA, 
1L, NA, 1L, 1L), Indian = c(NA, NA, 1L, 1L, 1L)),
 class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662