I am tidying my data in R, and want to turn multiple columns into 1, using a function iterating over the items of a vector. I was wondering whether you could help me out to:
- work away a semantic error,
- and make my code more efficient?
My data is based on a survey with 32 questions. Each question has multiple answers. Each answer is a column, with options 1 and NA.
For one question, a section of the dataset can be reproduced as follows:
XV2_1 <- c(1,NA,NA,NA)
XV2_2 <- c(NA,1,NA,NA)
XV2_3 <- c(NA,NA,NA,1)
XV2_4 <- c(NA,NA,1,NA)
id <- c(12,13,14,15)
dat <- data.frame(id,XV2_1, XV2_2, XV2_3,XV2_4)
> dat
id XV2_1 XV2_2 XV2_3 XV2_4
1 12 1 NA NA NA
2 13 NA 1 NA NA
3 14 NA NA NA 1
4 15 NA NA 1 NA
This is the data I would like to have (
question_2_answers <- c("Yellow","Blue","Green","Orange") #this is a vector based on the answers of the questionnaire
collapsed <- c("Yellow","Blue","Orange","Green")
collapsed_dataframe <- data.frame(id,collapsed)
>collapsed_dataframe
id X2
1 12 Yellow
2 13 Blue
3 14 Green
4 15 Orange
So far, I tried a sequence of "ifelse's" combined with mutate:
library(tidyverse)
question_2_answers <- c("Yellow","Blue","Green","Orange") #this is a vector based on the answers of the questionnaire
dat %>%
mutate(
Colour = tidy_Q2(question_2_answers,XV2_1,XV2_2,XV2_3,XV2_4)
)
tidy_Q2 <- function(a,b,c,d,e) {
ifelse(b == 1, a[1],ifelse(
c==1,a[2],ifelse(
d==1,a[3],a[4])))
}
However, my output is not as expected:
id XV2_1 XV2_2 XV2_3 XV2_4 Colour
1 12 1 NA NA NA Yellow
2 13 NA 1 NA NA <NA>
3 14 NA NA NA 1 <NA>
4 15 NA NA 1 NA <NA>
I would have liked it to be as follows:
id XV2_1 XV2_2 XV2_3 XV2_4 Colour
1 12 1 NA NA NA Yellow
2 13 NA 1 NA NA Blue
3 14 NA NA NA 1 Green
4 15 NA NA 1 NA Orange
Does anyone know a way to remove the error? Another question that I'd like to ask, is whether my code can be more efficient? I have 32 survey_questions in store after this, I'd like to automate the process as much as possible. Notable things to take in mind:
- not all survey questions have the same amount of options (i.e. question 2 has 2 options and therefore 2 columns, whilst question 10 has 8 options and 8 columns)
- some values are strings, instead of 1 or NA
Always happy to learn,
Best,
Maria