0

My problem is similar to this: R dplyr rowwise mean or min and other methods? Wondering if there is any functions (or combination of functions such as pivot_ etc.), that might give the desired output in a usual dplyr one-liner?

library(tidyverse); set.seed(1); 

#Sample Data: 
sampleData <- data.frame(O = seq(1, 9, by = .1), A = rnorm(81), U = sample(1:81,
    81), I = rlnorm(81),  R = sample(c(1, 81), 81, replace = T)); #sampleData;
 
#NormalOuput:
NormalOuput <- sampleData %>% summarise_all(list(min = min, max = max)); 
NormalOuput;
#>   O_min   A_min U_min     I_min R_min O_max    A_max U_max    I_max R_max
#> 1     1 -2.2147     1 0.1970368     1     9 2.401618    81 14.27712    81

#Expected output:
ExpectedOuput <- data.frame(stats = c('min', 'max'), O = c(1, 9), A = c(-2.2147,
    2.401618), U = c(1, 81), I = c(0.1970368, 14.27712), R = c(1, 81)); 
ExpectedOuput;
#>   stats O         A  U          I  R
#> 1   min 1 -2.214700  1  0.1970368  1
#> 2   max 9  2.401618 81 14.2771200 81

Created on 2020-08-26 by the reprex package (v0.3.0)

Note:

The number of columns might be huge in the real scenario, so the names cannot be called directly.

EDIT

At best, I get this:

sampleData %>% summarise(across(everything(), list(min = min, max = max))) %>% 
    t() %>% data.frame(Value = .) %>% tibble::rownames_to_column('Variables')

   Variables      Value
1      O_min  1.0000000
2      O_max  9.0000000
3      A_min -2.2146999
4      A_max  2.4016178
5      U_min  1.0000000
6      U_max 81.0000000
7      I_min  0.1970368
8      I_max 14.2771167
9      R_min  1.0000000
10     R_max 81.0000000

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
massisenergy
  • 1,764
  • 3
  • 14
  • 25
  • 1
    A `data.table` option: `library(data.table); setDT(sampleData); out <- sampleData[,lapply(.SD, function(x) c(min(x), max(x)))]; out` – markus Aug 26 '20 at 17:59

3 Answers3

1

I would suggest a mix of tidyverse functions like next. You have to reshape your data, then aggregate with the summary functions you want and then as strategy you can re format again and obtain the expected output:

library(tidyverse)

sampleData %>% pivot_longer(cols = names(sampleData)) %>%
  group_by(name) %>% summarise(Min=min(value,na.rm=T),
                               Max=max(value,na.rm=T)) %>% 
  rename(var=name) %>%
  pivot_longer(cols = -var) %>%
  pivot_wider(names_from = var,values_from=value)

The output:

# A tibble: 2 x 6
  name      A      I     O     R     U
  <chr> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Min   -2.21  0.197     1     1     1
2 Max    2.40 14.3       9    81    81
Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    @Duck This might be more efficient (although not tested) : `library(purrr);map_dfr(sampleData, function(x) c(min(x), max(x)))` – markus Aug 26 '20 at 18:01
  • 1
    @markus, instead of `min`/`max`, why not `range`? (also untested) – r2evans Aug 26 '20 at 18:02
1

You can use the new-ish across() to eliminate one of Duck's pivots:

sampleData %>%
  summarise(across(everything(),
                   list(min = min, max = max))) %>%
  pivot_longer(
    cols = everything(),
    names_to = c("var", "stat"),
    names_sep = "_"
  ) %>%
  pivot_wider(id_cols = "stat",
              names_from = "var")
# # A tibble: 2 x 6
#   stat      O     A     U      I     R
#   <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
# 1 min       1 -2.21     1  0.197     1
# 2 max       9  2.40    81 14.3      81

But the nicest is probably markus's suggestion in comments, which I've adapted here:

map_dfr(sampleData, function(x) c(min(x), max(x))) %>%
  mutate(stat = c("min", "max"))
# # A tibble: 2 x 6
#       O     A     U      I     R stat 
#   <dbl> <dbl> <int>  <dbl> <dbl> <chr>
# 1     1 -2.21     1  0.197     1 min  
# 2     9  2.40    81 14.3      81 max
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • This probably would be the shortest one-liner: `sampleData %>% purrr::map_dfr(~c(min(.), max(.))) %>% mutate(stats = c('min', 'max'))` – massisenergy Aug 26 '20 at 19:20
  • 1
    Yeah. I'm annoyed that I can't find a way to eliminate the `mutate`. I really though either using the `.id` argument or by naming the result vector I could get the `stat` column added inside the `map_dfr` call. – Gregor Thomas Aug 26 '20 at 19:33
0

While playing with pivot_longer, I discovered that this two-step one-liner also works (building on the answer by @Gregor Thomas, here only one pivot_ in stead of two or more):

sampleData %>% 
    summarise(across(everything(), list(min, max))) %>% 
        pivot_longer(everything(), names_to = c(".value", "stats"),
                     names_sep = "_")

# A tibble: 2 x 6
  stats     O     A     U      I     R
  <chr> <dbl> <dbl> <int>  <dbl> <dbl>
1 1         1 -2.21     1  0.197     1
2 2         9  2.40    81 14.3      81

More here: https://tidyr.tidyverse.org/reference/pivot_longer.html#examples

massisenergy
  • 1,764
  • 3
  • 14
  • 25