Stack dataframe columns with two distinct suffix into two columns, preferably using tidyverse

Question

Suppose I have a list of dataframes, mylist and I want to do the same operation to each dataframes.

Say my dataframes look like this:

set.seed(1)
test.tbl <- tibble(
  case1_diff = rnorm(10,0),
  case1_avg = rnorm(10,0),
  case2_diff = rnorm(10,0),
  case2_avg = rnorm(10,0),
  case3_diff = rnorm(10,0),
  case3_avg = rnorm(10,0),
  case4_diff = rnorm(10,0),
  case4_avg = rnorm(10,0),
)
> head(test.tbl)
# A tibble: 6 x 8
  case1_diff case1_avg case2_diff case2_avg case3_diff case3_avg case4_diff case4_avg
       <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
1     -0.626    1.51       0.919     1.36       -0.165     0.398     2.40       0.476
2      0.184    0.390      0.782    -0.103      -0.253    -0.612    -0.0392    -0.710
3     -0.836   -0.621      0.0746    0.388       0.697     0.341     0.690      0.611
4      1.60    -2.21      -1.99     -0.0538      0.557    -1.13      0.0280    -0.934
5      0.330    1.12       0.620    -1.38       -0.689     1.43     -0.743     -1.25 
6     -0.820   -0.0449    -0.0561   -0.415      -0.707     1.98      0.189      0.291

and I wish to stack them into two columns of diff and avg as 40 x 2 dataframe.

Normally, I would just separate it into two objects through select(ends_with("diff")) and select(ends_with("avg")), pivot them, then bind_rows.

However, since my original object is list, I want to do it using map like:

mylist %>%
   map(*insertfunction1*) %>%
   map(*insertfunction2*)

meaning I would need to do this without separating. I would also need to make sure that diff and avg is correctly paired.

What I have tried so far is

test.tbl %>%
  pivot_longer(cols=everything(),
               names_to = "metric") %>%
  mutate(metric = str_remove(metric,"[0-9]+")) %>%
  pivot_wider(id_cols=metric,
              values_from=value)

```reshape(as.data.frame(test.tbl), varying = list(c(1, 3, 5, 7), c(2, 4, 6, 8)), direction = 'long', v.names = c('diff', 'avg'), timevar = 'case', times = paste0('case', 1:4))``` — Eyayaw, Apr 05 '21 at 21:51

score 2 · Accepted Answer · answered Apr 05 '21 at 20:51

2

We don't need both pivot_longer and pivot_wider. it can be done within pivot_longer itself by specifying the names_to and the names_sep argument

library(dplyr)
library(tidyr)
test.tbl %>% 
     pivot_longer(cols = everything(), names_to = c('grp', '.value'),
            names_sep = "_") %>%
     select(-grp)

-output

# A tibble: 40 x 2
#      diff    avg
#     <dbl>  <dbl>
# 1 -0.626   1.51 
# 2  0.919   1.36 
# 3 -0.165   0.398
# 4  2.40    0.476
# 5  0.184   0.390
# 6  0.782  -0.103
# 7 -0.253  -0.612
# 8 -0.0392 -0.710
# 9 -0.836  -0.621
#10  0.0746  0.388
# … with 30 more rows

answered Apr 05 '21 at 20:51

akrun

874,273
37
540
662

Wow, great solution! Would you explain `.value`? The documentation states `.value indicates that component of the name defines the name of the column containing the cell values, overriding values_to.` but I am having difficulty grasping the concept. I'm guessing that's serving important role? – aiorr Apr 05 '21 at 20:59
1

@aiorr sorry, was away. When you specify `.value` with `names_sep`, it suggest the value part is from the suffix after the `_` and whatever before that in the column name will be used in creating the column 'grp' (which we removed afterwards in `select`). – akrun Apr 05 '21 at 21:54

Stack dataframe columns with two distinct suffix into two columns, preferably using tidyverse

1 Answers1

Related