0

I'm trying to get the right regex for names_pattern() of pivot_longer().

  • Wide Data:
ID      WC_L1 WC_L2 Read_F_L1 Read_F_L2 Read_C_L2 APL_L1 APL_L2
  <glue>  <dbl> <dbl>     <dbl>     <dbl>     <dbl>  <dbl>  <dbl>
1 Person1    55    84        70        73        80     34     40
2 Person2    88   102        66       140        80     51     45
3 Person3   102    71        59        59        60     67     46
4 Person4    53    43        61        70        60     33     30
5 Person5    87   145        73       107        80     56     72
  • Desired output:

Get all names between _ and pivot them all into their columns, create a "group" column based on what is after the last _ (L1/L2)

ID WC  READ_F READ_C  APL  GROUP 
 .  .  .       .          L1
 .  .  .       .          L2
  • This is very similar to this, but I couldn't get the regex right. Thanks in advance!

  • Data:

structure(list(ID = structure(c("Person1", "Person2", "Person3", 
"Person4", "Person5"), class = c("glue", "character")), WC_L1 = c(55, 
88, 102, 53, 87), WC_L2 = c(84, 102, 71, 43, 145), Read_F_L1 = c(70, 
66, 59, 61, 73), Read_F_L2 = c(73, 140, 59, 70, 107), Read_C_L2 = c(80, 
80, 60, 60, 80), APL_L1 = c(34, 51, 67, 33, 56), APL_L2 = c(40, 
45, 46, 30, 72)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))
user438383
  • 5,716
  • 8
  • 28
  • 43
Larissa Cury
  • 806
  • 2
  • 11
  • 1
    `pivot_longer(wide, -ID, names_pattern = "(.*)_(L.*)$", names_to = c(".value", "GROUP"))` – r2evans May 15 '23 at 15:07
  • 1
    Another link provides good discussion of the use of `names_pattern=` and `names_to=`: https://stackoverflow.com/q/68058000/3358272 – r2evans May 15 '23 at 15:08
  • Thank you, @r2evans , it is working as expected! The Regex part is a work in progress for me. If you don't mind, would you explain the logic of yours? – Larissa Cury May 15 '23 at 15:13
  • 2
    It seems that we are interested in preserving the L1/L2 component, so we must capture that L and whatever is after it; `(L.*)$` means "must end with `L` and perhaps something more". We can make it more specific with `(L[12])$` (only L1/L2) if desired. From there, we want to keep the portion before the preceding underscore, so we need to add `(.*)_` before that, which will discard the `_` before `L` and keep everything before it. For regex help, I find https://stackoverflow.com/a/22944075/3358272 very helpful (though not R-specific, so extra backslashes are often required). – r2evans May 15 '23 at 15:22

1 Answers1

0

r2evan's comment answer is great, but in case you want to go further:

df %>% 
  pivot_longer(-ID, names_pattern = "(.*)_(.*)$", names_to = c(".value", "GROUP")) %>%
  pivot_longer(-c(ID, GROUP, WC, APL), names_pattern = "(.*)_(.*)$", names_to = c(".value", "READ"))
  # get the last character of every ID 
  mutate(ID = as.numeric(str_sub(ID, -1)), 
        GROUP = as.numeric(str_sub(GROUP, -1)))

# A tibble: 20 × 6
      ID GROUP    WC   APL name  value
   <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
 1     1     1    55    34 F        70
 2     1     1    55    34 C        NA
 3     1     2    84    40 F        73
 4     1     2    84    40 C        80
 5     2     1    88    51 F        66
 6     2     1    88    51 C        NA
 7     2     2   102    45 F       140
 8     2     2   102    45 C        80
 9     3     1   102    67 F        59
10     3     1   102    67 C        NA
11     3     2    71    46 F        59
Mark
  • 7,785
  • 2
  • 14
  • 34