2

I've got a dataset that looks like this:

df_start <- tribble(
    ~name,   ~age, ~x1_sn_ctrl1, ~x1_listing2_2, ~x1_affect1, ~x2_sn_ctrl1, ~x1_listing2_2, ~x2_affect1, ~number,
    "John",   28,        1,            1,             9,           4,            5,                9,       6,
    "Paul",   27,        2,            1,             4,           1,            3,                3,       4,
    "Ringo",  31,        3,            1,             2,           2,            5,                8,       9)

I need to pivot_longer() while handling the groupings within my columns:

  • There are 2 x-values (1 and 2)
  • There are 3 questions (sn_ctrl1, listing2_2, affect1) for each x-value

In my actual dataset, there are 14 x's.

Essentially, what I'd like to do is to apply pivot_longer() to the x-values but leave my 3 questions (sn_ctrl1, listing2_2, affect1) wide.

What I'd like to end up with is this:

df_end <- tribble(
    ~name, ~age, ~xval, ~sn_ctrl1, ~listing2_2, ~affect1, ~number,
    "John", 28,    1,        1,         1,          9,       6,
    "John", 28,    2,        4,         5,          9,       6,
    "Paul", 27,    1,        2,         1,          4,       4,  
    "Paul", 27,    2,        1,         3,          3,       4, 
    "Ringo", 31,   1,        3,         1,          2,       9, 
    "Ringo", 31,   2,        2,         5,          8,       9)

I have tried lots of very unsuccessful attempts playing with regex in names_pattern & pivot_longer but am completely striking out.

Anyone know how to tackle this?

THANKS!

PS: Note that I tried to make a straightforward reproducible example. The actual names of my columns vary slightly. For instance, there is x1_sn_ctrl1 & x1_attr1_ctrl2.

wscampbell
  • 351
  • 2
  • 11

1 Answers1

7

You can use :

tidyr::pivot_longer(df_start, 
                    cols = -c(name, age, number), 
                    names_to = c("xval", ".value"),
                    names_pattern = 'x(\\d+)_(.*)')

Which yields

# A tibble: 9 x 7
  name    age number xval  sn_ctrl1 listing2_2 affect1
  <chr> <dbl>  <dbl> <chr>    <dbl>      <dbl>   <dbl>
1 John     28      6 1            1          1       9
2 John     28      6 2            4         NA       9
3 John     28      6 1           NA          5      NA
4 Paul     27      4 1            2          1       4
5 Paul     27      4 2            1         NA       3
6 Paul     27      4 1           NA          3      NA
7 Ringo    31      9 1            3          1       2
8 Ringo    31      9 2            2         NA       8
9 Ringo    31      9 1           NA          5      NA
DomQ
  • 4,184
  • 38
  • 37
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Amazing! Thanks so, so much @Ronak! So fast, clever, helpful. – wscampbell Feb 20 '20 at 10:47
  • I've made a slight edit to the column name in `df_start` above to reflect my actual situation--the format of the column names is actually varied (similar to how it is in the repro above). Because the format isn't identical for all columns, the `names_pattern` argument you offered doesn't work. I tried changing `names_pattern` as you have it to `names_sep = "_"`, which I thought would've worked but doesn't seem to interact with having only 2 arguments in the `names_to` parameter. Any suggestion of how to fix this? Sorry to have you double-work. I tried to solve it first! – wscampbell Feb 20 '20 at 15:34
  • @wscampbell you can use `tidyr::pivot_longer(df_start, cols = -c(name, age, number), names_to = c("xval", ".value"), names_pattern = 'x(\\d)_(.*)')` but since the column names becomes duplicate it creates a `.copy` variable. Maybe ask a new question? – Ronak Shah Feb 20 '20 at 17:15
  • That solved it! Perfect! There are no repeating column names, so `.copy` isn't being created. You nailed it! Thanks once again, @Ronak! – wscampbell Feb 20 '20 at 17:37
  • I took the liberty to update the answer so that it works with the current state of the question. Cheers @RonakShah – DomQ Dec 17 '20 at 10:00