0

Here is the basic information about my data, it is a Longitudinal dataset. The variables are: ID, Age, Gender, Q1AnswerTime1, Q2AnswerTime1, Q3AnswerTime1, Q1AnswerTime2, Q2AnswerTime2, Q3AnswerTime2, Q1AnswerTime3, Q2AnswerTime3, Q3AnswerTime3.

Now, how can I transfer this dataset from wide format to longformat? What I want is the long format only contains 7 variables: ID, Age, Gender, Q1Answer, Q2Answer, Q3Answer, and Time. The values of Q1Answer, Q2Answer, and Q3Answer will depend on "Time" variable.

To be clear, for the original study, we have 5 people in this Longitudinal study, the study collected data for 3 years. Every year, each person will be asked 3 questions: Q1, Q2, Q3. So in the end, we have 12 variables in wide format.

Update the code part:

df <- tibble(
  ID = c(1, 2, 3),
  Age = c(25, 32, 28),
  Gender = c("Male", "Female", "Male"),
  Q1AnswerTime1 = c(10, 15, 12),
  Q2AnswerTime1 = c(7, 9, 8),
  Q3AnswerTime1 = c(5, 6, 4),
  Q1AnswerTime2 = c(11, 16, 13),
  Q2AnswerTime2 = c(8, 10, 9),
  Q3AnswerTime2 = c(6, 7, 5),
  Q1AnswerTime3 = c(12, 17, 14),
  Q2AnswerTime3 = c(9, 11, 10),
  Q3AnswerTime3 = c(7, 8, 6)
)

df

And the expected output will be like this:

dfLong <- tibble(
  ID = c(1,1,1,2,2,2, 3,3,3),
  Age = c(25,25,25,32,32, 32, 28,28,28),
  Gender = c("Male","Male","Male", "Female","Female","Female", "Male","Male","Male"),
  Q1 = c(10,11,12,15,16,17,12,13,14),
  Q2 = c(7,8,9,9,10,11,8,9,10),
  Q3 = c(5,6,7,6,7,8,4,5,6),
  Time = c(1,2,3,1,2,3,1,2,3)
)

I try to use tidyr function in R but this example is too complex for me, can you guys help me with that?

Thanks! OriginalDataset

ExpectedOutput

Mark
  • 7,785
  • 2
  • 14
  • 34
  • 5
    Instead of describing the data, give a sample. ie data with 3-4 rows that represent your true data – Onyambu Jun 19 '23 at 05:09
  • Welcome to stack overflow. Here’s a link which may be useful to asking a question that makes it easier for others to help. [Link for guidance on asking questions](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Jun 19 '23 at 08:39

1 Answers1

0
library(tidyverse)

# pivot the data from wide to long format
df %>%
  pivot_longer(
    cols = starts_with("Q"),
    names_to = c(".value", "Time"),
    names_sep = "AnswerTime") %>%
    select(-Time, Time)

# A tibble: 9 × 7
     ID   Age Gender    Q1    Q2    Q3 Time 
  <dbl> <dbl> <chr>  <dbl> <dbl> <dbl> <chr>
1     1    25 Male      10     7     5 1    
2     1    25 Male      11     8     6 2    
3     1    25 Male      12     9     7 3    
4     2    32 Female    15     9     6 1    
5     2    32 Female    16    10     7 2    
6     2    32 Female    17    11     8 3    
7     3    28 Male      12     8     4 1    
8     3    28 Male      13     9     5 2    
9     3    28 Male      14    10     6 3    
Mark
  • 7,785
  • 2
  • 14
  • 34
  • I just used your codes to create a expected table, and put the pictures in the question. Can you help me with that pic? For your code, the problem I have right now is how to put the value of "value" variables under variables "Question". – Spencer Cui Jun 19 '23 at 17:29
  • what's the difference between Q1AnswerTime1, Q1AnswerTime2, Q2AnswerTime1, and Q2AnswerTime2? Which ones do you want to be Q!, and which Q2? – Mark Jun 20 '23 at 01:42
  • OH is Q1AnswerTime1 Question =1, Time =1 ? – Mark Jun 20 '23 at 01:47