Return statistics like min or max from columns into rows with dplyr pipeline

Question

My problem is similar to this: R dplyr rowwise mean or min and other methods? Wondering if there is any dplyr functions (or combination of functions such as pivot_ etc.), that might give the desired output in a usual dplyr one-liner?

library(tidyverse); set.seed(1); 

#Sample Data: 
sampleData <- data.frame(O = seq(1, 9, by = .1), A = rnorm(81), U = sample(1:81,
    81), I = rlnorm(81),  R = sample(c(1, 81), 81, replace = T)); #sampleData;
 
#NormalOuput:
NormalOuput <- sampleData %>% summarise_all(list(min = min, max = max)); 
NormalOuput;
#>   O_min   A_min U_min     I_min R_min O_max    A_max U_max    I_max R_max
#> 1     1 -2.2147     1 0.1970368     1     9 2.401618    81 14.27712    81

#Expected output:
ExpectedOuput <- data.frame(stats = c('min', 'max'), O = c(1, 9), A = c(-2.2147,
    2.401618), U = c(1, 81), I = c(0.1970368, 14.27712), R = c(1, 81)); 
ExpectedOuput;
#>   stats O         A  U          I  R
#> 1   min 1 -2.214700  1  0.1970368  1
#> 2   max 9  2.401618 81 14.2771200 81

^{Created on 2020-08-26 by the reprex package (v0.3.0)}

Note:

The number of columns might be huge in the real scenario, so the names cannot be called directly.

EDIT

At best, I get this:

sampleData %>% summarise(across(everything(), list(min = min, max = max))) %>% 
    t() %>% data.frame(Value = .) %>% tibble::rownames_to_column('Variables')

   Variables      Value
1      O_min  1.0000000
2      O_max  9.0000000
3      A_min -2.2146999
4      A_max  2.4016178
5      U_min  1.0000000
6      U_max 81.0000000
7      I_min  0.1970368
8      I_max 14.2771167
9      R_min  1.0000000
10     R_max 81.0000000

A `data.table` option: `library(data.table); setDT(sampleData); out <- sampleData[,lapply(.SD, function(x) c(min(x), max(x)))]; out` — markus, Aug 26 '20 at 17:59

score 1 · Accepted Answer · answered Aug 26 '20 at 17:52

I would suggest a mix of tidyverse functions like next. You have to reshape your data, then aggregate with the summary functions you want and then as strategy you can re format again and obtain the expected output:

library(tidyverse)

sampleData %>% pivot_longer(cols = names(sampleData)) %>%
  group_by(name) %>% summarise(Min=min(value,na.rm=T),
                               Max=max(value,na.rm=T)) %>% 
  rename(var=name) %>%
  pivot_longer(cols = -var) %>%
  pivot_wider(names_from = var,values_from=value)

The output:

# A tibble: 2 x 6
  name      A      I     O     R     U
  <chr> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Min   -2.21  0.197     1     1     1
2 Max    2.40 14.3       9    81    81

@Duck This might be more efficient (although not tested) : `library(purrr);map_dfr(sampleData, function(x) c(min(x), max(x)))` — markus, Aug 26 '20 at 18:01
@markus, instead of `min`/`max`, why not `range`? (also untested) — r2evans, Aug 26 '20 at 18:02

score 1 · Answer 2 · answered Aug 26 '20 at 18:33

You can use the new-ish across() to eliminate one of Duck's pivots:

sampleData %>%
  summarise(across(everything(),
                   list(min = min, max = max))) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("var", "stat"),
    names_sep = "_"
  ) %>%
  pivot_wider(id_cols = "stat",
              names_from = "var")
# # A tibble: 2 x 6
#   stat      O     A     U      I     R
#   <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
# 1 min       1 -2.21     1  0.197     1
# 2 max       9  2.40    81 14.3      81

But the nicest is probably markus's suggestion in comments, which I've adapted here:

map_dfr(sampleData, function(x) c(min(x), max(x))) %>%
  mutate(stat = c("min", "max"))
# # A tibble: 2 x 6
#       O     A     U      I     R stat 
#   <dbl> <dbl> <int>  <dbl> <dbl> <chr>
# 1     1 -2.21     1  0.197     1 min  
# 2     9  2.40    81 14.3      81 max

This probably would be the shortest one-liner: `sampleData %>% purrr::map_dfr(~c(min(.), max(.))) %>% mutate(stats = c('min', 'max'))` — massisenergy, Aug 26 '20 at 19:20
Yeah. I'm annoyed that I can't find a way to eliminate the `mutate`. I really though either using the `.id` argument or by naming the result vector I could get the `stat` column added inside the `map_dfr` call. — Gregor Thomas, Aug 26 '20 at 19:33

massisenergy · Answer 3 · 2020-08-28T12:19:31.750

While playing with pivot_longer, I discovered that this two-step one-liner also works (building on the answer by @Gregor Thomas, here only one pivot_ in stead of two or more):

sampleData %>% 
    summarise(across(everything(), list(min, max))) %>% 
        pivot_longer(everything(), names_to = c(".value", "stats"),
                     names_sep = "_")

# A tibble: 2 x 6
  stats     O     A     U      I     R
  <chr> <dbl> <dbl> <int>  <dbl> <dbl>
1 1         1 -2.21     1  0.197     1
2 2         9  2.40    81 14.3      81

More here: https://tidyr.tidyverse.org/reference/pivot_longer.html#examples

Return statistics like min or max from columns into rows with dplyr pipeline

3 Answers3