-2

I have multiple columns in R studio that correspond to race/ethnicity. These race/ethnicity variables are specified in 6 different columns. Participants are allowed to check multiple columns if it applies to them.

Race.ethnicity1 = american indian or alaska native, race.ethnicity2 = asian or asian american, race.ethnicity3 = black or african-american, race.ethnicity4= native american or pacific islander, race.ethnicity5 = white, race.ethnicity6 = other)

I am trying to combine these 6 different columns into one Race/Ethnicity column in R. So, if participants clicked multiple selections, they would be coded as biracial/multiracial If they clicked white and no other race/ethnicty selection, then they will be coded as white if they click black and no other selection, then they are coded as black, and vice versa

I began the code with

df$RaceEthnicity <- ifelse(df$Race.Ethnicity_5 == "White" & !(PT_baseline$Race.Ethnicity_1 == "NA" | PT_baseline$Race.Ethnicity_2 == "NA" | PT_baseline$Race.Ethnicity_3 == "NA" | PT_baseline$Race.Ethnicity_4 == "NA" | PT_baseline$Race.Ethnicity_6 == "NA"), "White", NA) table(PT_baseline$RaceEthnicity)

But, I think this method of doing things would be very long. Is there a simpler way of doing this code?

  • 1
    welcome to stackoverflow. i recommend [taking the tour](https://stackoverflow.com/tour), as well as reading [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and [what's on topic](https://stackoverflow.com/help/on-topic). – Franz Gleichmann Jan 21 '22 at 21:09
  • This is a good situation where it's better to have tidy data instead of wide. – Bill O'Brien Jan 21 '22 at 21:46
  • Does this answer your question? [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – Dan Adams Jan 31 '22 at 15:46
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 31 '22 at 15:47

1 Answers1

0

Hi ClinicPsych_ebeme and welcome to StackOverflow! In the future, please try to post a reproducible examplein your question. These help respondents better diagnose and address your questions.

In response to your question, I think you can accomplish this in three steps:

  1. Convert your data from wide to long format. In other words, right now as I understand it, you have a column for each ethnicity. You will want instead to have a single column for ethnicity. You can use the melt() function in the reshape2:: package to accomplish this.

  2. You will want to count the number of selected ethnicities in that column. This can be accomplished with the summarize() or mutate() functions of the dplyr:: package.

  3. You will want to create a column that indicates whether a study participant is multiracial based on the number of ethnicities selected. This can be accomplished with the case_when() function of the dplyr:: package.

Below is a reproducible example of what I think you are trying to accomplish:

##Loading Necessary Packages
library(reshape2)# For melt() function
library(dplyr)# For other functions

##Creating Fake Data##
set.seed(5)#For reproducibility
participant<-seq(1:25)
africanamerican<-sample(c(0,1), 25, replace=TRUE)
set.seed(21)#For new random sample
caucasian<-sample(c(0,1), 25, replace=TRUE)
set.seed(2)#For new random sample
indigenous<-sample(c(0,1), 25, replace=TRUE)
set.seed(11)#For new random sample
pacificislander<-sample(c(0,1), 25, replace=TRUE)
set.seed(19)#For new random sample
latinx<-sample(c(0,1), 25, replace=TRUE)
set.seed(15)#For new random sample
asianamerican<-sample(c(0,1), 25, replace=TRUE)
set.seed(33)#For new random sample
middleeastern<-sample(c(0,1), 25, replace=TRUE)
set.seed(41)#For new random sample
other<-sample(c(0,1), 25, replace=TRUE)

##Fake Data Frame##
df.wide<-data.frame(participant, africanamerican, indigenous, caucasian, pacificislander,
               asianamerican, latinx, middleeastern, other)


##Converting 0s to NAs for melt() function to remove unselected ethnicities for each participant##
df.wide[df.wide==0]<-NA

##Convert from wide to long format##
df.long<-melt(df.wide, id.vars=1, measure.vars=c(2:9), variable.name="ethnicity", na.rm=TRUE)

##Summarizing data to determine if participants reported multiple ethncities##
df.final<-df.long %>% 
  group_by(participant) %>% 
  summarize(number_reported=n()) %>% 
  mutate(multiracial = case_when(number_reported >=2 ~ TRUE,
                                           TRUE ~ FALSE))


Sean McKenzie
  • 707
  • 3
  • 13