Create new variable from common colname ending before pivoting long

Question

I have a data frame with multiple variables that have similar endings ("0m","6m" or "12m"). These refer to the month of the data. Some of the variables are only collected once, whilst others at 3 time points ("0m","6m" or "12m"). The data looks like this:

wide= data.frame(id = c(1:5),
                 agree = c("y","n","n","y","y"),
                 test1_0m = c(2,3,4,3,5),
                 test1_6m = c(3,5,2,6,7),
                 test1_12m = c(6,7,8,4,5),
                 score_0m = c(55,44,33,22,11),
                 score_6m = c(77,66,55,44,33),
                 score_12m = c(99,88,77,66,55))


> wide
  id agree test1_0m test1_6m test1_12m score_0m score_6m score_12m
1  1     y        2        3         6       55       77        99
2  2     n        3        5         7       44       66        88
3  3     n        4        2         8       33       55        77
4  4     y        3        6         4       22       44        66
5  5     y        5        7         5       11       33        55

I want to pivot_longer to get a column Month which has 0m, 6m, or 12m as entries, plus columns called test1 and score which have the result for the corresponding person and month.

I've found a really helpful answer here: https://stackoverflow.com/questions/69798752/pivot-longer-for-multiple-sets-having-the-same-names-to

But I don't know how to specifiy the correct regex to get the values I want.

I've tried this, which is wrong:

wide%>%
  pivot_longer(cols = contains("_"), 
               names_to = c("Month", ".value"), 
               names_pattern = "(.*\\_)(.*)",
               values_drop_na = TRUE )

This is the output I want:

long
   id agree Month test1 score
1   1     y    0m     2    55
2   2     n    0m     3    44
3   3     n    0m     4    33
4   4     y    0m     3    22
5   5     y    0m     5    11
6   1     y    6m     3    77
7   2     n    6m     5    66
8   3     n    6m     2    55
9   4     y    6m     6    44
10  5     y    6m     7    33
11  1     y   12m     6    99
12  2     n   12m     7    88
13  3     n   12m     8    77
14  4     y   12m     4    66
15  5     y   12m     5    55

score 2 · Answer 1 · answered Oct 31 '22 at 21:27

You can use .value character here it

indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely

Then you also need to define 2 capturing groups with your regex:

wide %>%
  pivot_longer(contains('_'), 
               names_to = c('.value', 'Month'), 
               names_pattern = '(\\w+)_(\\w+)')

# A tibble: 15 × 5
      id agree Month   test1 score
   <int> <chr> <chr> <dbl> <dbl>
 1     1 y     0m        2    55
 2     1 y     6m        3    77
 3     1 y     12m       6    99
 4     2 n     0m        3    44
 5     2 n     6m        5    66
 6     2 n     12m       7    88
 7     3 n     0m        4    33
 8     3 n     6m        2    55
 9     3 n     12m       8    77
10     4 y     0m        3    22
11     4 y     6m        6    44
12     4 y     12m       4    66
13     5 y     0m        5    11
14     5 y     6m        7    33
15     5 y     12m       5    55

score 1 · Accepted Answer · answered Oct 31 '22 at 21:30

I think you can just use names_sep = instead of names_pattern = , and you need to switch the order of the names (".value", "Month").

wide %>% 
  pivot_longer(cols = contains("_"), 
               names_to = c(".value", "Month"), 
               names_sep = "_")

Result:

# A tibble: 15 × 5
      id agree Month test1 score
   <int> <chr> <chr> <dbl> <dbl>
 1     1 y     0m        2    55
 2     1 y     6m        3    77
 3     1 y     12m       6    99
 4     2 n     0m        3    44
 5     2 n     6m        5    66
 6     2 n     12m       7    88
 7     3 n     0m        4    33
 8     3 n     6m        2    55
 9     3 n     12m       8    77
10     4 y     0m        3    22
11     4 y     6m        6    44
12     4 y     12m       4    66
13     5 y     0m        5    11
14     5 y     6m        7    33
15     5 y     12m       5    55

score 1 · Answer 3 · answered Nov 01 '22 at 02:34

Using data.table

library(data.table)
 melt(setDT(wide), measure.vars = measure(value.name, month, sep = "_"))

-output

      id  agree  month test1 score
    <int> <char> <char> <num> <num>
 1:     1      y     0m     2    55
 2:     2      n     0m     3    44
 3:     3      n     0m     4    33
 4:     4      y     0m     3    22
 5:     5      y     0m     5    11
 6:     1      y     6m     3    77
 7:     2      n     6m     5    66
 8:     3      n     6m     2    55
 9:     4      y     6m     6    44
10:     5      y     6m     7    33
11:     1      y    12m     6    99
12:     2      n    12m     7    88
13:     3      n    12m     8    77
14:     4      y    12m     4    66
15:     5      y    12m     5    55

Create new variable from common colname ending before pivoting long

3 Answers3