I have a dataset where respondents could select multiple responses for the same question, one describing their nationality. Most only selected one category, whereas some selected multiple (including a free text entry which I will report the entries to separately). I want to know how to honour people who have selected multiple responses without distorting the rest of the data
Effectively, all I want to do is get basic demographics from this (n, mean, sd, etc.), so I am okay with the sum count of different nationality groups within my sample exceeding the number of participants (unless there is some reason this is a bad idea that I haven't thought of, in which case please say). I ran my columns through as.numeric(),
which responded that some values were coerced to NAs (those with multiple responses)- I know how to fix this error with e.g. gsub(",", "")
but not in a meaningful way that preserves these people's answers. I saw a couple of solutions to this question here, but I'm still an R beginner so I'm unsure what the best route is.
I would be interested in any solutions wherein I can count those who selected multiple answers to this question as their own group, as well as within their original categories. e.g. One table with English: 5, Welsh: 3, Scottish: 2, Northern Irish: 1, British: 4, Other: 0; One table with English: 3, Welsh: 1, Scottish: 1, Northern Irish: 1, British: 3, Other: 0, Multiple selected: 2.
Dummy data is as follows:
Nationality <- c(1, "1,2,3,5", 2, "1,2,5", 1, 1, 3, 5, 5, 4)
I also later re-code the numeric values to display the choice text, as below:
df <- df %>%
mutate(Nationality = recode(Nationality,
'1' = 'English',
'2' = 'Welsh',
'3' = 'Scottish',
'4' = 'Northern Irish',
'5' = 'British',
'6' = 'Other'))
Here's the code I will run it through to get demographic statistics:
df %>%
group_by(Nationality) %>%
summarise(n = n()) %>%
mutate(Percentage = round(100*(n / sum(n)), 2))
I tried converting the relevant columns of my data set to numeric (including the column for nationality)
df <- df %>% mutate(across(c(1, 2, 4, 5, 7, 13:57), as.numeric))
Which, as predicted, returned the 'Warning: NAs introduced by coercion'. I've thought about extracting the column and using the solutions in the post I linked but haven't had any luck.
Not posted a question before, so if I need to provide any more info please let me know. I hope I've explained it well enough to give the gist of the problem.