-1

I am working on database that contains patients' information collected through a 24-week time period. Each patient in the dataframe is represented by a single row with a unique id. There are four numerical variables (cpr, cf, h and m) that have been measured at 0, 12 and 24 weeks, so that 3 columns for each variable have been created: cpr0, cpr12, cpr24, cf0, cf12, cf24... and so on.

In order to successfully analyse this dataframe, it is much more convenient to transform it from wide to long format. What I would like to achieve is to have a new "time" variable with possible values c(0, 12, 24) and the rest of the measured variables coded without the reference to the time they were collected, like the sample dataframe in this image.

I have tried using pivot_longer() function from tidyr, but I don't seem to get the result I am expecting.

  • 4
    Welcome to Stack Overflow. We cannot read data into R from images. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(yourwidedata)`, if that is not too large. – neilfws Apr 12 '23 at 23:43
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 13 '23 at 15:09

1 Answers1

1

Let's make some data that resembles what you described:

set.seed(1001)
dataset <- data.frame(id = LETTERS[1:5],
                      cpr0 = sample(1:20, 5, replace = TRUE),
                      cpr12 = sample(1:20, 5, replace = TRUE),
                      cpr24 = sample(1:20, 5, replace = TRUE),
                      cf0 = sample(1:20, 5, replace = TRUE),
                      cf12 = sample(1:20, 5, replace = TRUE),
                      cf24 = sample(1:20, 5, replace = TRUE),
                      h0 = sample(1:20, 5, replace = TRUE),
                      h12 = sample(1:20, 5, replace = TRUE),
                      h24 = sample(1:20, 5, replace = TRUE),
                      m0 = sample(1:20, 5, replace = TRUE),
                      m12 = sample(1:20, 5, replace = TRUE),
                      m24 = sample(1:20, 5, replace = TRUE))

dataset

  id cpr0 cpr12 cpr24 cf0 cf12 cf24 h0 h12 h24 m0 m12 m24
1  A    3    11    12  12   15   11  5   8  12  5  11  14
2  B   15     6     4   3    6   14  4  20   4 15   3   2
3  C   16    14    19   1    1    2 17  20  10 13   2  13
4  D    7     4    18   8    8    4  6  18   2 11  11  20
5  E   16    12    10  18    8   20 12   2   4  6  18   9

We need to pivot to a longer form, separate each variable into a name and a time, and then pivot wider again to put the variable names into columns.

One way to do that, using a regex to split between letter and number:

library(tidyr)

dataset_pivot <- dataset %>% 
  pivot_longer(-id) %>% 
  separate(name, into = c("var", "time"), "(?<=[a-z])(?=[0-9])") %>% 
  pivot_wider(names_from = "var", 
              values_from = "value")

Result:

dataset_pivot 

# A tibble: 15 x 6
   id    time    cpr    cf     h     m
   <chr> <chr> <int> <int> <int> <int>
 1 A     0         3    12     5     5
 2 A     12       11    15     8    11
 3 A     24       12    11    12    14
 4 B     0        15     3     4    15
 5 B     12        6     6    20     3
 6 B     24        4    14     4     2
 7 C     0        16     1    17    13
 8 C     12       14     1    20     2
 9 C     24       19     2    10    13
10 D     0         7     8     6    11
11 D     12        4     8    18    11
12 D     24       18     4     2    20
13 E     0        16    18    12     6
14 E     12       12     8     2    18
15 E     24       10    20     4     9
neilfws
  • 32,751
  • 5
  • 50
  • 63