How to use R for handling (pivoting?) social survey raw data?

Question

We frequently ask scale questions in our social surveys; respondents provides their agreement with our statement (strongly agree, agree, neither nor, disagree, strongly disagree). The survey result usually comes in an aggregated format, i.e for each question(variable), the answers are provided in a single column, where 5=strongly agree, 1=strongly disagree etc.

Now we came across a new survey tool where answers were partitions into several columns for one question. For example Q1_1 column = Strongly agree for Q1, Q1_5 column = Strongly disagree. So for each question we received 5 columns of answers, if respondent answered Strongly Agree, Q1_1 related row is marked as 1, where Q1_2 - Q1_5 related row for that respondent are marked as 0.

Please can anyone kindly share a solution to 'aggregated' the answers from the new survey tool, so instead of having 5 columns for each question, we would have one column per question, with value 1-5.

I'm new to R, I thought R would handle this instead of having to manually change in Excel.

Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557 (and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. — r2evans, Dec 17 '20 at 14:02
Welcome to SO, Shawn! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically after `set.seed(1)`), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. — r2evans, Dec 17 '20 at 14:02
You could try a series of nested `ifelse()` statements. `ifelse(col_1 == 1, 1, ifelse(col_2 ==1, 2, ifelse(col_3 == 1, 3, ifelse(col_4 == 1, 4, 5))))` — edsandorf, Dec 17 '20 at 14:06

score 1 · Accepted Answer · answered Dec 17 '20 at 14:11

Try this approach reshaping and next time follow the advice from @r2evans as we have to type data. Here the code:

library(dplyr)
library(tidyr)
#Data
df <- data.frame(Respondent=paste0('Respondent',1:10),
                 Q6_1=c(1,0,1,1,1,1,0,0,0,1),
                 Q6_2=c(0,1,0,0,0,0,1,1,0,1),
                 Q6_3=rep(0,10),
                 Q6_4=c(rep(0,8),1,0),stringsAsFactors = F
                 )
#Code
new <- df %>% pivot_longer(-Respondent) %>%
  separate(name,c('variable','answer'),sep='_') %>%
  filter(value==1) %>%
  select(-value) %>%
  filter(!duplicated(Respondent)) %>%
  pivot_wider(names_from = variable,values_from=answer)

Output:

# A tibble: 10 x 2
   Respondent   Q6   
   <chr>        <chr>
 1 Respondent1  1    
 2 Respondent2  2    
 3 Respondent3  1    
 4 Respondent4  1    
 5 Respondent5  1    
 6 Respondent6  1    
 7 Respondent7  2    
 8 Respondent8  2    
 9 Respondent9  4    
10 Respondent10 1

I only curious why your data in case of member 10 have two values of 1. Maybe a typo or is that possible?

Thank you, it was a typo for respondent10. Sure I will try to present the data frame instead of screenshot next time. — Shawn, Dec 17 '20 at 15:36
@Shawn Great! Nice to help you :) please next time use `dput()` for your questions. Sometimes data is too large and we can type it :) — Duck, Dec 17 '20 at 15:37

score 0 · Answer 2 · answered Dec 17 '20 at 16:28

We can use data.table methods

library(data.table)
dcast(unique(melt(setDT(df), id.var = 'Respondent')[,
  c('variable', 'answer') := tstrsplit(variable, '_', 
   type.convert = TRUE)][value == 1], by = "Respondent"), 
    Respondent  ~ variable, value.var = 'answer')

-output

#      Respondent Q6
# 1:  Respondent1  1
# 2: Respondent10  1
# 3:  Respondent2  2
# 4:  Respondent3  1
# 5:  Respondent4  1
# 6:  Respondent5  1
# 7:  Respondent6  1
# 8:  Respondent7  2
# 9:  Respondent8  2
#10:  Respondent9  4

data

df <- structure(list(Respondent = c("Respondent1", "Respondent2", "Respondent3", 
"Respondent4", "Respondent5", "Respondent6", "Respondent7", "Respondent8", 
"Respondent9", "Respondent10"), Q6_1 = c(1, 0, 1, 1, 1, 1, 0, 
0, 0, 1), Q6_2 = c(0, 1, 0, 0, 0, 0, 1, 1, 0, 1), Q6_3 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), Q6_4 = c(0, 0, 0, 0, 0, 0, 0, 0, 
1, 0)), class = "data.frame", row.names = c(NA, -10L))

How to use R for handling (pivoting?) social survey raw data?

2 Answers2

data