0

I have a dataset in wide format with no time variable and I would like to create a time variable and turn it into long format for longitudinal analysis. The original dataset looks like this:

id <- c(1,2,3)
pdq1 <- c(3,5,6)
pdq2 <- c(1,3,4)
pdq3 <- c(4,5,4)
scor_abp1 <- c(1,2,3)
scor_abp2 <- c(2,2,4)
scor_abp3 <- c(1,4,5)
dat <- data.frame(id,pdq1,pdq2,pdq3,scor_abp1,scor_abp2,scor_abp3)

Desired output:

id <- c(1,1,1,2,2,2,3,3,3)
time <- c(1,2,3,1,2,3,1,2,3)
pdq <- c(3,1,4,5,3,5,6,4,4)
abp <- c(1,2,1,2,2,4,3,4,5)
dat <- data.frame(id,time,pdq,abp)

pdq and scor_abp are two different measures. The variables with suffix 1 means it is measured at Time 1, with suffix 2 means it is measured at Time 2, etc.

I would appreciate all the help there is!!! Thanks in advance!

Bruh
  • 277
  • 1
  • 6
  • 2
    Please also provide the desired output so it's clear what exactly you need. Have you tried anything yet? This looks like a classic case of `pivot_longer` – MrFlick Feb 14 '23 at 14:49

1 Answers1

2

When you have multiple columns to pivot, you can use .value in pivot_longer. In names_to, ".value" relates to the multiple columns you want to pivot, which are followed by a digit (the time column). This all needs to be set up in the names_to and names_pattern arguments.

tidyr::pivot_longer(dat,
                    -id,
                    names_to = c(".value", "time"),
                    names_pattern = "(pdq|scor_abp)(\\d)")

# A tibble: 9 × 4
     id time    pdq scor_abp
  <dbl> <chr> <dbl>    <dbl>
1     1 1         3        1
2     1 2         1        2
3     1 3         4        1
4     2 1         5        2
5     2 2         3        2
6     2 3         5        4
7     3 1         6        3
8     3 2         4        4
9     3 3         4        5
Maël
  • 45,206
  • 3
  • 29
  • 67