Creating an Ethnicity Variable with Multiple Column Names as Variables

Question

I have a survey dataset that includes self-reported ethnicity. Participants were allowed to select as many ethnicities as they wanted to. The data structure looks like this:

Hispanic English Indian

1        NA      NA     

NA       1       NA     

NA       NA      1  

NA       1       1

1        1       1

What I want to do is create a new categorical ethnicity variable where the column names take the place of the 1s above. In addition, if someone selected more than one ethnicity, then the categorical ethnicity variable should include both, like this:

Hispanic English Indian Ethnicity

1        NA      NA     Hispanic

NA       1       NA     English

NA       NA      1      Indian

NA       1       1      English_Indian

1        1       1      Hispanic_English_Indian

akrun · Accepted Answer · 2021-07-29T19:17:36.593

We can use apply to loop over the rows (MARGIN = 1), then paste the names of the row values that are not an NA

df1$Ethnicity <- apply(df1, 1, function(x) 
     paste(names(x)[!is.na(x)], collapse= "_"))

-output

 df1
  Hispanic English Indian               Ethnicity
1        1      NA     NA                Hispanic
2       NA       1     NA                 English
3       NA      NA      1                  Indian
4       NA       1      1          English_Indian
5        1       1      1 Hispanic_English_Indian

data

df1 <- structure(list(Hispanic = c(1L, NA, NA, NA, 1L), 
English = c(NA, 
1L, NA, 1L, 1L), Indian = c(NA, NA, 1L, 1L, 1L)),
 class = "data.frame", row.names = c(NA, 
-5L))

Creating an Ethnicity Variable with Multiple Column Names as Variables

1 Answers1

data