0

We frequently ask scale questions in our social surveys; respondents provides their agreement with our statement (strongly agree, agree, neither nor, disagree, strongly disagree). The survey result usually comes in an aggregated format, i.e for each question(variable), the answers are provided in a single column, where 5=strongly agree, 1=strongly disagree etc.

enter image description here

Now we came across a new survey tool where answers were partitions into several columns for one question. For example Q1_1 column = Strongly agree for Q1, Q1_5 column = Strongly disagree. So for each question we received 5 columns of answers, if respondent answered Strongly Agree, Q1_1 related row is marked as 1, where Q1_2 - Q1_5 related row for that respondent are marked as 0.

enter image description here

Please can anyone kindly share a solution to 'aggregated' the answers from the new survey tool, so instead of having 5 columns for each question, we would have one column per question, with value 1-5.

I'm new to R, I thought R would handle this instead of having to manually change in Excel.

James Z
  • 12,209
  • 10
  • 24
  • 44
Shawn
  • 3
  • 1
  • 3
    Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557 (and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Dec 17 '20 at 14:02
  • 3
    Welcome to SO, Shawn! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically after `set.seed(1)`), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Dec 17 '20 at 14:02
  • 1
    You could try a series of nested `ifelse()` statements. `ifelse(col_1 == 1, 1, ifelse(col_2 ==1, 2, ifelse(col_3 == 1, 3, ifelse(col_4 == 1, 4, 5))))` – edsandorf Dec 17 '20 at 14:06

2 Answers2

1

Try this approach reshaping and next time follow the advice from @r2evans as we have to type data. Here the code:

library(dplyr)
library(tidyr)
#Data
df <- data.frame(Respondent=paste0('Respondent',1:10),
                 Q6_1=c(1,0,1,1,1,1,0,0,0,1),
                 Q6_2=c(0,1,0,0,0,0,1,1,0,1),
                 Q6_3=rep(0,10),
                 Q6_4=c(rep(0,8),1,0),stringsAsFactors = F
                 )
#Code
new <- df %>% pivot_longer(-Respondent) %>%
  separate(name,c('variable','answer'),sep='_') %>%
  filter(value==1) %>%
  select(-value) %>%
  filter(!duplicated(Respondent)) %>%
  pivot_wider(names_from = variable,values_from=answer)

Output:

# A tibble: 10 x 2
   Respondent   Q6   
   <chr>        <chr>
 1 Respondent1  1    
 2 Respondent2  2    
 3 Respondent3  1    
 4 Respondent4  1    
 5 Respondent5  1    
 6 Respondent6  1    
 7 Respondent7  2    
 8 Respondent8  2    
 9 Respondent9  4    
10 Respondent10 1 

I only curious why your data in case of member 10 have two values of 1. Maybe a typo or is that possible?

Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    Thank you, it was a typo for respondent10. Sure I will try to present the data frame instead of screenshot next time. – Shawn Dec 17 '20 at 15:36
  • @Shawn Great! Nice to help you :) please next time use `dput()` for your questions. Sometimes data is too large and we can type it :) – Duck Dec 17 '20 at 15:37
0

We can use data.table methods

library(data.table)
dcast(unique(melt(setDT(df), id.var = 'Respondent')[,
  c('variable', 'answer') := tstrsplit(variable, '_', 
   type.convert = TRUE)][value == 1], by = "Respondent"), 
    Respondent  ~ variable, value.var = 'answer')

-output

#      Respondent Q6
# 1:  Respondent1  1
# 2: Respondent10  1
# 3:  Respondent2  2
# 4:  Respondent3  1
# 5:  Respondent4  1
# 6:  Respondent5  1
# 7:  Respondent6  1
# 8:  Respondent7  2
# 9:  Respondent8  2
#10:  Respondent9  4

data

df <- structure(list(Respondent = c("Respondent1", "Respondent2", "Respondent3", 
"Respondent4", "Respondent5", "Respondent6", "Respondent7", "Respondent8", 
"Respondent9", "Respondent10"), Q6_1 = c(1, 0, 1, 1, 1, 1, 0, 
0, 0, 1), Q6_2 = c(0, 1, 0, 0, 0, 0, 1, 1, 0, 1), Q6_3 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), Q6_4 = c(0, 0, 0, 0, 0, 0, 0, 0, 
1, 0)), class = "data.frame", row.names = c(NA, -10L))
akrun
  • 874,273
  • 37
  • 540
  • 662