complex(ish) pivot_longer in R

Question

I have a dataset that roughly looks like this:

person_id mem_was_there_1 mem_was_there_2 mem_was_there_3 new_number_yn_1 new_number_yn_2 new_number_yn_3
      <dbl>           <dbl>           <dbl>           <dbl> <lgl>           <lgl>           <lgl>          
1       100               1               2               3 FALSE           TRUE            FALSE          
2       101               4               5               6 TRUE            FALSE           FALSE

I need to pivot this data into something like this:

# A tibble: 6 x 4
  person_id    nr mem_was_there new_number_yn
      <dbl> <dbl>         <dbl> <lgl>        
1       100     1             1 FALSE        
2       100     2             2 TRUE         
3       100     3             3 FALSE        
4       101     1             4 TRUE         
5       101     2             5 FALSE        
6       101     3             6 FALSE

I would like to use a pivot_longer() from dplyr option. I tried using this code, but I do not use what to fill in at the ??? to regex to the third _. Ideally, I would like a separate names_sep for both 'mem_was_there_xx' and 'new_number_yn_xx'

df1 %>%
  pivot_longer(cols = c(matches("^mem_was_there"), matches("^new_number_yn")), 
               names_to = c('.value', 'nr'), 
               names_sep = ??? )

df1 <-
 tribble(~person_id, ~mem_was_there_1, ~mem_was_there_2, ~mem_was_there_3, ~new_number_yn_1,  ~new_number_yn_2, ~new_number_yn_3,
         100, 1, 2, 3, F, T, F,
         101, 4, 5, 6, T, F, F)

I would recommend to create a pivot_longer_spec object, which essentially is a data frame where you map your old variable names to your new desired variable names. Pretty awesome and flexible that way. — deschen, Jan 19 '21 at 12:45
See an example here: https://stackoverflow.com/a/61367970/2725773 — deschen, Jan 19 '21 at 12:46

deschen · Answer 1 · 2021-01-19T14:28:08.500

1

This should do the trick:

spec <- data.frame(.name  = names(df1)[-1],
                   nr     = rep(1:3, 2),
                   .value = c(rep("mem_was_there", 3), rep("new_number_yn", 3)),
                   stringsAsFactors = FALSE)

library(tidyverse)    
df1 %>%
  pivot_longer_spec(., spec)

gives:

# # A tibble: 6 x 4
#   person_id    nr mem_was_there new_number_yn
#       <dbl> <int>         <dbl> <lgl>        
# 1       100     1             1 FALSE        
# 2       100     2             2 TRUE         
# 3       100     3             3 FALSE        
# 4       101     1             4 TRUE         
# 5       101     2             5 FALSE        
# 6       101     3             6 FALSE

edited Jan 19 '21 at 14:28

answered Jan 19 '21 at 12:59

deschen

10,012
3
27
50

Thanks, this indeed works for this small fake dataset, but does not scale very well to larger datasets, where for example the nr is unknown (my data it is between 300 and 500 ) and when there are many different variables you want to exclude from the pivot or when the exact order of variables are not know either. Will keep this open for now, hoping on a pivot_longer() solution that will allow for a split just before the third `_`. I now jut use two pivot_longer calls and then a left_join(). – L Smeets Jan 20 '21 at 09:53
There‘s probably a regex way of splitting your names and then use the .value thing. However, I‘m also sure that you can dynamically create the spec data frame. You know the columns and column positions from your original wide format data set, so then it‘s only the question how to smartly put that info into the spec object. – deschen Jan 20 '21 at 09:56

score 0 · Accepted Answer · answered Jan 20 '21 at 16:49

As deschen recommended, I played around with the regex a bit and this pivot longer call does work as expected. (and is a bit cleaner than having to manually create the pivot_longer_spec and also works if there are an unequal amount of mem_was_there_x and new_number_yn_y (it will just insert missings, where applicable)

df1 %>% 
  pivot_longer(
    cols = c(matches("^mem_was_there"), matches("^new_number_yn")),
    names_to = c('.value', 'nr'),
    names_pattern = "([A-Za-z_]+_)([0-9]*)")

complex(ish) pivot_longer in R

2 Answers2

Linked