How to combine check all that apply race/ethnicity columns into one single column

Question

I have multiple columns in R studio that correspond to race/ethnicity. These race/ethnicity variables are specified in 6 different columns. Participants are allowed to check multiple columns if it applies to them.

Race.ethnicity1 = american indian or alaska native, race.ethnicity2 = asian or asian american, race.ethnicity3 = black or african-american, race.ethnicity4= native american or pacific islander, race.ethnicity5 = white, race.ethnicity6 = other)

I am trying to combine these 6 different columns into one Race/Ethnicity column in R. So, if participants clicked multiple selections, they would be coded as biracial/multiracial If they clicked white and no other race/ethnicty selection, then they will be coded as white if they click black and no other selection, then they are coded as black, and vice versa

I began the code with

df$RaceEthnicity <- ifelse(df$Race.Ethnicity_5 == "White" & !(PT_baseline$Race.Ethnicity_1 == "NA" | PT_baseline$Race.Ethnicity_2 == "NA" | PT_baseline$Race.Ethnicity_3 == "NA" | PT_baseline$Race.Ethnicity_4 == "NA" | PT_baseline$Race.Ethnicity_6 == "NA"), "White", NA) table(PT_baseline$RaceEthnicity)

But, I think this method of doing things would be very long. Is there a simpler way of doing this code?

welcome to stackoverflow. i recommend [taking the tour](https://stackoverflow.com/tour), as well as reading [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and [what's on topic](https://stackoverflow.com/help/on-topic). — Franz Gleichmann, Jan 21 '22 at 21:09
This is a good situation where it's better to have tidy data instead of wide. — Bill O'Brien, Jan 21 '22 at 21:46
Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) — Dan Adams, Jan 31 '22 at 15:46
Please provide enough code so others can better understand or reproduce the problem. — Community, Jan 31 '22 at 15:47

score 0 · Answer 1 · answered Jan 21 '22 at 21:44

Hi ClinicPsych_ebeme and welcome to StackOverflow! In the future, please try to post a reproducible examplein your question. These help respondents better diagnose and address your questions.

In response to your question, I think you can accomplish this in three steps:

Convert your data from wide to long format. In other words, right now as I understand it, you have a column for each ethnicity. You will want instead to have a single column for ethnicity. You can use the melt() function in the reshape2:: package to accomplish this.
You will want to count the number of selected ethnicities in that column. This can be accomplished with the summarize() or mutate() functions of the dplyr:: package.
You will want to create a column that indicates whether a study participant is multiracial based on the number of ethnicities selected. This can be accomplished with the case_when() function of the dplyr:: package.

Below is a reproducible example of what I think you are trying to accomplish:

##Loading Necessary Packages
library(reshape2)# For melt() function
library(dplyr)# For other functions

##Creating Fake Data##
set.seed(5)#For reproducibility
participant<-seq(1:25)
africanamerican<-sample(c(0,1), 25, replace=TRUE)
set.seed(21)#For new random sample
caucasian<-sample(c(0,1), 25, replace=TRUE)
set.seed(2)#For new random sample
indigenous<-sample(c(0,1), 25, replace=TRUE)
set.seed(11)#For new random sample
pacificislander<-sample(c(0,1), 25, replace=TRUE)
set.seed(19)#For new random sample
latinx<-sample(c(0,1), 25, replace=TRUE)
set.seed(15)#For new random sample
asianamerican<-sample(c(0,1), 25, replace=TRUE)
set.seed(33)#For new random sample
middleeastern<-sample(c(0,1), 25, replace=TRUE)
set.seed(41)#For new random sample
other<-sample(c(0,1), 25, replace=TRUE)

##Fake Data Frame##
df.wide<-data.frame(participant, africanamerican, indigenous, caucasian, pacificislander,
               asianamerican, latinx, middleeastern, other)


##Converting 0s to NAs for melt() function to remove unselected ethnicities for each participant##
df.wide[df.wide==0]<-NA

##Convert from wide to long format##
df.long<-melt(df.wide, id.vars=1, measure.vars=c(2:9), variable.name="ethnicity", na.rm=TRUE)

##Summarizing data to determine if participants reported multiple ethncities##
df.final<-df.long %>% 
  group_by(participant) %>% 
  summarize(number_reported=n()) %>% 
  mutate(multiracial = case_when(number_reported >=2 ~ TRUE,
                                           TRUE ~ FALSE))

How to combine check all that apply race/ethnicity columns into one single column

1 Answers1