0

One example would better help to show my current issue. Here is my data frame:

df = data.frame(id=1:6,
                        period1_var_1 = rnorm(6,20,10),
                        period1_var_2= rnorm(6,20,10),
                        period2_var_1= rnorm(6,20,10),
                        period2_var_2= rnorm(6,20,10))

print(df)
  id period1_var_1 period1_var_2 period2_var_1   period2_var_2
1  1      17.80754    29.2438046      10.32224       18.137845
2  2      17.31409    16.5381384      20.47930       31.398457
3  3      18.28758    20.1996814      16.79001        9.392826
4  4      13.79413    15.6104777      15.46428       27.026170
5  5      35.27592     0.3142531      23.59174       25.132573
6  6      38.17034    13.3490548      15.94226       23.129076

In practice, my data frame has many variables such as 'var_1', 'var_2' and also many periods such as 'period1', 'period2'. I want to pivot to a longer data frame, while keeping the same names 'var_1', 'var_2' of my variable in the new data set. Note that I don't want to write manually the variable names 'var_1', 'var_2' as, as I said, there are many more than two variables in practice.

Here is the desired result:

library(tidyverse)

df_long = data.frame(id=rep(1:6,2),
                             time = rep(c("period1","period2"),each=6),
                             var_1 = c(df$period1_var_1,df$period2_var_1),
                             var_2 = c(df$period1_var_2,df$period2_var_2))
print(df_long)
   id    time    var_1     var_2
1   1 period1 19.23147  6.779928
2   2 period1 11.50213 27.328238
3   3 period1 18.81068 -2.631681
4   4 period1 16.32453 21.365790
5   5 period1 38.69836 33.446293
6   6 period1 11.30663 17.967992
7   1 period2 19.23147 10.886480
8   2 period2 11.50213 25.068797
9   3 period2 18.81068 14.494535
10  4 period2 16.32453 30.959613
11  5 period2 38.69836  2.398250
12  6 period2 11.30663 20.876985

Note, finally, that I could not use the 'name_sep' argument in the 'pivot_longer' function as I only want to split what comes before the first '' to what comes after (which might contains other '' as in the example).

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Anthony
  • 377
  • 2
  • 6
  • 13
  • This is just a pivot_longer: `df %>% pivot_longer(!id, names_pattern = "(period\\d+)_(var_\\d+)", names_to=c("time", ".value"))` – MrFlick Feb 25 '22 at 07:09
  • Actually, the 'var_1' and 'period1' names was just an example. I am looking for something more general in the 'names_pattern'. The names of the columns do not have to start with 'period' but need to be cut after the first '_'. Same comments for 'var_1' and 'var_2'. – Anthony Feb 25 '22 at 08:04
  • The `names_patttern=` can accept any regular expression with capture groups. You'll need to adapt it for whatever your actual column names are. – MrFlick Feb 25 '22 at 08:06
  • That's actually what I'm struggling to do. – Anthony Feb 25 '22 at 08:38

0 Answers0