23

I am trying to use pivot_longer. However, I am not sure how to use names_sep or names_pattern to solve this.

dat <- tribble(
     ~group,  ~BP,  ~HS,  ~BB, ~lowerBP, ~upperBP, ~lowerHS, ~upperHS, ~lowerBB, ~upperBB,
        "1", 0.51, 0.15, 0.05,     0.16,     0.18,      0.5,     0.52,     0.14,     0.16,
      "2.1", 0.67, 0.09, 0.06,     0.09,     0.11,     0.66,     0.68,     0.08,      0.1,
      "2.2", 0.36, 0.13, 0.07,     0.12,     0.15,     0.34,     0.38,     0.12,     0.14,
      "2.3", 0.09, 0.17, 0.09,     0.13,     0.16,     0.08,     0.11,     0.15,     0.18,
      "2.4", 0.68, 0.12, 0.07,     0.12,     0.14,     0.66,     0.69,     0.11,     0.13,
        "3", 0.53, 0.15, 0.06,     0.14,     0.16,     0.52,     0.53,     0.15,     0.16)
               

Desired output (First row from wide data)

group names   values lower upper
   1    BP      0.51  0.16  0.18
   1    HS      0.15  0.5   0.52
   1    BB      0.05  0.14  0.16
Dave2e
  • 22,192
  • 18
  • 42
  • 50
Droc
  • 257
  • 1
  • 2
  • 8
  • Can you give an example of how the desired output looks like as well as a reproducible data example using ```dput```? – Fnguyen Apr 22 '20 at 14:14
  • 1
    Hi, thank you for the comment, Im not familiar whith dput. But tried to make the desired output more clear. – Droc Apr 22 '20 at 14:57
  • Nevermind ```dput```, I hadn't seen tribble before but it works the same. – Fnguyen Apr 22 '20 at 14:59

4 Answers4

29

Here is solution following a similar method that @Fnguyen used but using the newer pivot_longer and pivot_wider construct:

library(dplyr)
library(tidyr)

longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>% 
     mutate(limit=ifelse(limit=="", "value", limit))

answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")

Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.

Update:
This regular expressions "(.*)(..)$" means:
( ) ( ) Look for two parts,
(.*) the first part should have zero or more characters
(..) the second part should have just 2 characters at the “$” end of the string

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Kool, more accustomed to this syntax. However Im not good with the "(.*)(..)$" signs. Do you know if these are explained somewhere – Droc Apr 22 '20 at 16:20
  • 1
    This works like magic. Your answer here saved me a whole afternoon of head scratching and hair pulling – Faustin Gashakamba Sep 13 '22 at 11:10
6

A data.table version (not sure yet how to retain the original names so that you dont need to post substitute them https://github.com/Rdatatable/data.table/issues/2551):

library(data.table)
df <- data.table(dat)
v <- c("BP","HS","BB")
setnames(df, v, paste0("x",v) )

g <- melt(df, id.vars = "group",
     measure.vars = patterns(values = "x" ,
                             lower = "lower",
                             upper = "upper"),
     variable.name = "names")

g[names==1, names := "BP" ]
g[names==2, names := "HS" ]
g[names==3, names := "BB" ]

    group names values lower upper
 1:     1    BP   0.51  0.16  0.18
 2:   2.1    BP   0.67  0.09  0.11
 3:   2.2    BP   0.36  0.12  0.15
 4:   2.3    BP   0.09  0.13  0.16
 5:   2.4    BP   0.68  0.12  0.14
 6:     3    BP   0.53  0.14  0.16
 7:     1    HS   0.15  0.50  0.52
 8:   2.1    HS   0.09  0.66  0.68
 9:   2.2    HS   0.13  0.34  0.38
10:   2.3    HS   0.17  0.08  0.11
11:   2.4    HS   0.12  0.66  0.69
12:     3    HS   0.15  0.52  0.53
13:     1    BB   0.05  0.14  0.16
14:   2.1    BB   0.06  0.08  0.10
15:   2.2    BB   0.07  0.12  0.14
16:   2.3    BB   0.09  0.15  0.18
17:   2.4    BB   0.07  0.11  0.13
18:     3    BB   0.06  0.15  0.16
desval
  • 2,345
  • 2
  • 16
  • 23
  • What about having the `names` in multiple columns and `values`, `lower`, and `upper` in a single column? – jmutua Jul 21 '22 at 12:14
6

I'd like to add an alternative tidyverse solution drawing from the answer provided by @Dave2e.

Like Dave2e's solution it's a two-step procedure (first rename, then reshape). Instead of reshaping the data twice, I add the prefix "values" to the columns named "BP", "HS", and "BB" using rename_with. This was necessary for getting the column names right when using the .value sentinel in the names_to argument of pivot_longer.

library(dplyr)
library(tidyr)

dat %>% 
  rename_with(~sub("^(BP|HS|BB)$", "values\\1", .)) %>%     # add prefix values
  pivot_longer(cols= -1,
               names_pattern = "(.*)(BP|HS|BB)$",
               names_to = c(".value", "names")) 
maraab
  • 425
  • 3
  • 10
5

Based on your example data this solution using dplyr works for me:

library(dplyr)

dat %>%
  gather(key, values,-group) %>%
  mutate(names = gsub("lower","",gsub("upper","",key))) %>%
  separate(key, into = c("key1","key2") ,"[[:upper:]]", perl=T) %>%
  mutate(key1 = case_when(key1 == "" ~ "values", TRUE ~ key1)) %>%
  select(group,names,key1,values) %>%
  rowid_to_column() %>%
  spread(key1,values) %>%
  select(-rowid) %>%
  group_by(group,names) %>%
  summarise_all(mean,na.rm = TRUE)
Fnguyen
  • 1,159
  • 10
  • 23
  • That is some serious code. Somehow this does not work for me: "Error: 1 components of `...` were not used. We detected these problematic arguments: * `perl`" – Droc Apr 22 '20 at 15:43
  • @Droc have you tried removing the ```, perl = T``` argument in the ```separate``` statement? – Fnguyen Apr 22 '20 at 15:48
  • @Droc also as a fun way to better understand what I did and how to improve/repeat go through each line by adding a ```head()``` to see what each operation does. – Fnguyen Apr 22 '20 at 15:49
  • separate doesn't have a perl argument though, and never did, it might have been silently ignored in the past, see : https://github.com/tidyverse/tidyr/issues/789 – moodymudskipper Feb 04 '21 at 03:12