dplyr: getting grouped min and max of columns in a for loop

Question

I am trying to get the grouped min and max of several columns using a for loop:

My data:

df <- data.frame(a=c(1:5, NA), b=c(6:10, NA), c=c(11:15, NA), group=c(1,1,1,2,2,2))
> df
   a  b  c group
1  1  6 11     1
2  2  7 12     1
3  3  8 13     1
4  4  9 14     2
5  5 10 15     2
6 NA NA NA     2

My attempt:

cols <- df %>% select(a,b) %>% names()

for(i in seq_along(cols)) {
  output <- df %>% dplyr::group_by(group) %>% 
             dplyr::summarise_(min=min(.dots=i, na.rm=T), max=max(.dots=i, na.rm=T))
  print(output)
}

Desired output for column a:

  group   min   max
  <dbl> <int> <int>
1     1     1     3
2     2     4     5

AlexB · Accepted Answer · 2020-01-05T19:59:08.207

Using dplyr package, you can get:

df %>%
  na.omit() %>% 
  pivot_longer(-group) %>%
  group_by(group, name) %>%
  summarise(min = min(value),
            max = max(value)) %>%
  arrange(name, group)

# group name    min   max
# <dbl> <chr> <int> <int>
# 1     1 a         1     3
# 2     2 a         4     5
# 3     1 b         6     8
# 4     2 b         9    10
# 5     1 c        11    13
# 6     2 c        14    15

akrun · Answer 2 · 2020-01-05T20:18:26.693

We can use summarise_all after grouping by 'group' and if it needs to be in a particular order, then use select to select based on the column names

library(dplyr)
library(stringr)
df %>%
    group_by(group) %>% 
    summarise_all(list(min = ~ min(., na.rm = TRUE),
                       max = ~ max(., na.rm = TRUE))) %>%
    select(group, order(str_remove(names(.), "_.*")))
# A tibble: 2 x 7
#  group a_min a_max b_min b_max c_min c_max
#  <dbl> <int> <int> <int> <int> <int> <int>
#1     1     1     3     6     8    11    13
#2     2     4     5     9    10    14    15

dc37 · Answer 3 · 2020-01-05T20:03:51.263

Without to use for loop but using dplyr and tidyr from tidyverse, you can get the min and max of each columns by 1) pivoting the dataframe in a longer format, 2) getting the min and max value per group and then 3) pivoting wider the dataframe to get the expected output:

library(tidyverse)
df %>% pivot_longer(., cols = c(a,b,c), names_to = "Names",values_to = "Value") %>%
  group_by(group,Names) %>% summarise(Min = min(Value, na.rm =TRUE), Max = max(Value,na.rm = TRUE)) %>%
  pivot_wider(., names_from = Names, values_from = c(Min,Max)) %>%
  select(group,contains("_a"),contains("_b"),contains("_c"))

# A tibble: 2 x 7
# Groups:   group [2]
  group Min_a Max_a Min_b Max_b Min_c Max_c
  <dbl> <int> <int> <int> <int> <int> <int>
1     1     1     3     6     8    11    13
2     2     4     5     9    10    14    15

Is it what you are looking for ?

score 1 · Answer 4 · answered Jan 06 '20 at 04:16

In base R, we can use aggregate and get min and max for multiple columns by group.

aggregate(.~group, df, function(x) 
           c(min = min(x, na.rm = TRUE),max= max(x, na.rm = TRUE)))

#  group a.min a.max b.min b.max c.min c.max
#1     1     1     3     6     8    11    13
#2     2     4     5     9    10    14    15

dplyr: getting grouped min and max of columns in a for loop

4 Answers4