0

I've read through a number of similar posts and tutorials but am really struggling to understand the solution to my issue. I have a dataset that is wide, and when I make it longer - I want to collapse two sets of data (both duration and results).

For each participant (id), there is a category, and then a series of blood test results. Each test has both duration (in days) and a result (numeric value).

Here's how it looks now:

id category duration_1 results_1 duration_2 results_2 duration_3 results_3
01 diabetic 58 32 65 56 76 87
02 prediabetic 54 32 65 25 76 35
03 unknown 46 65 65 56 21 67

How I'd like it to be is:

id category duration results
01 diabetic 58 32
01 diabetic 65 56
01 diabetic 76 87
02 prediabetic 54 32
02 prediabetic 65 25
02 prediabetic 76 35
03 unknown 46 65
03 unknown 65 25
03 unknown 21 67

I can get pivot longer to work for "results" - but I can't get it to pivot on both "results" and "duration".

Any assistance would be greatly appreciated. I'm still fairly new to R. Thanks!

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • Not sure if there is a master duplicate, but there's a few out there, including this one: https://stackoverflow.com/questions/61940984/using-pivot-longer-with-multiple-paired-columns-in-the-wide-dataset – thelatemail Dec 17 '21 at 01:34
  • Does this answer your question? [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) – camille Dec 17 '21 at 01:56
  • Also https://stackoverflow.com/q/25925556/5325862, https://stackoverflow.com/q/28729506/5325862, and https://stackoverflow.com/q/47272371/5325862 – camille Dec 17 '21 at 01:59

1 Answers1

3

One way is to separate the column names into two columns while you pivot (hence the names_sep below). Then, you can just drop the column number.

library(tidyverse)

df %>%
  tidyr::pivot_longer(!c(id, category),
                      names_to = c(".value", "num"),
                      names_sep = "_") %>%
  dplyr::select(-num)

Output

# A tibble: 9 × 4
  id    category    duration results
  <chr> <chr>          <dbl>   <dbl>
1 01    diabetic          32      23
2 01    diabetic          87      67
3 01    diabetic          98      78
4 02    prediabetic       43      45
5 02    prediabetic       34      65
6 02    prediabetic       12      12
7 03    unknown           32      54
8 03    unknown           75      45
9 03    unknown           43      34

Data

df <-
  structure(
    list(
      id = c("01", "02", "03"),
      category = c("diabetic", "prediabetic", "unknown"),
      duration_1 = c(32, 43, 32),
      results_1 = c(23, 45, 54),
      duration_2 = c(87, 34, 75),
      results_2 = c(67, 65, 45),
      duration_3 = c(98, 12, 43),
      results_3 = c(78, 12, 34)
    ),
    class = "data.frame",
    row.names = c(NA,-3L)
  )
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • why use `dplyr::` etc? Youve already loaded `tidyverse` – Onyambu Dec 17 '21 at 01:07
  • also note `pivot_longer(df, -c(id, category), names_to = ".value",names_pattern = '([^_]+)')` – Onyambu Dec 17 '21 at 01:09
  • @Onyambu As far as `dplyr::`, I just prefer (in general) to always be explicit. Great, thanks for that suggestion on the names pattern. That's a really clever way to handle it! – AndrewGB Dec 17 '21 at 04:57
  • 1
    Its okay, though you do not need it once you have loaded the libraries. Thats the whole purpose of loading the library. eg you are not using `magritrr::\`%>%\`` – Onyambu Dec 17 '21 at 05:16
  • 1
    Thanks that worked well. Much appreciated! – Tori Russell Dec 17 '21 at 15:48