-1

I have the following long format data that I would like to transform into wide format using R:

structure(list(survey_unique_id = c(2816790L, 2816790L, 2816790L, 
2585861L, 2585861L, 214733L, 214733L, 214733L, 224481L, 224481L, 
224481L), user_id = c(623333L, 623333L, 623333L, 623333L, 623333L, 
700200L, 700200L, 700200L, 700200L, 700200L, 700200L), 
survey_completion_date = c("3/3/2021 16:39", "3/3/2021 16:39", 
"3/3/2021 16:39", "1/29/2021 22:14", "1/29/2021 22:14", "11/27/2017 
19:02", "11/27/2017 19:02", "11/27/2017 19:02", "12/19/2017 21:02", 
"12/19/2017 21:02", "12/19/2017 21:02"), survey_id = c(1L, 1L, 
1L, 4L, 4L, 1L, 1L, 1L, 9L, 9L, 9L), question_id = c(1L, 2L, 
3L, 6L, 7L, 1L, 2L, 3L, 19L, 20L, 21L), question_score = c(7L, 
7L, 9L, 13L, 5L, 18L, 12L, 15L, 11L, 12L, 12L)), class = 
"data.frame", row.names = c(NA, -11L))

Original long format mock data

Right now, each row is a participant’s answer to a single question on one of what could be multiple questions. Ideally, I would like each row to be a participant and to look like this:

structure(list(ï..user_id = c(623333L, 700200L), survey_1_question_1_score = c(7L, 
18L), survey_1_question_2_score = c(7L, 12L), survey_1_question_3_score = c(9L, 
15L), survey_4_question_6_score = c(13L, NA), survey_4_question_7_score = c(5L, 
NA), survey_9_question_19_score = c(NA, 11L), survey_9_question_20_score = c(NA, 
12L), survey_9_question_21_score = c(NA, 12L)), class = "data.frame", row.names = c(NA, 
-2L))

Ideal wide format mock data

An issue here is that the original data only has the survey completion date but doesn’t indicate how many of a given survey each participant has already taken, so I think I will have to create a column “Survey Number” in the data like this before transposing. I am not sure how to create this new column in R (if that is the right next step) or how to then get the data into wide format. I’m unable to use excel because the file is too large. What is the simplest way forward here?

EDIT: Thanks everyone for the tip to use dput() and my sincere apologies for not doing a better job asking the question the first time around. This is my first time asking a question on Stack Overflow!

  • Welcome. Please use ``dput()`` to share your data and don't share your data using images. Thank you. – user438383 Sep 22 '21 at 14:41
  • Please create a reproducible example, like explained [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Don't paste pictures of data. – phiver Sep 22 '21 at 14:51

1 Answers1

0
library(tidyverse)
data <-
  tribble(
  ~participant, ~ question, ~ score,
  1, 1, 1,
  1, 2, 0,
  1, 3, 0,
  1, 1, 1,
  1, 2, 0,
  1, 3, 1,
  2, 1, 0,
  2, 2, 0,
  2, 3, 0,
  2, 1, 0,
  2, 2, 0,
  2, 3, 1,
)
data
#> # A tibble: 12 x 3
#>    participant question score
#>          <dbl>    <dbl> <dbl>
#>  1           1        1     1
#>  2           1        2     0
#>  3           1        3     0
#>  4           1        1     1
#>  5           1        2     0
#>  6           1        3     1
#>  7           2        1     0
#>  8           2        2     0
#>  9           2        3     0
#> 10           2        1     0
#> 11           2        2     0
#> 12           2        3     1

data %>%
  # add survey column assuming there is one combo of participant and question for each survey
  group_by(participant, question) %>%
  mutate(survey = row_number()) %>%
  
  # create grouping column
  unite(group, c(survey, question)) %>%
  pivot_wider(names_from = group, values_from = score)
#> # A tibble: 2 x 7
#> # Groups:   participant [2]
#>   participant `1_1` `1_2` `1_3` `2_1` `2_2` `2_3`
#>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1           1     1     0     0     1     0     1
#> 2           2     0     0     0     0     0     1

Created on 2021-09-22 by the reprex package (v2.0.1)

danlooo
  • 10,067
  • 2
  • 8
  • 22