0

I have a data set which I want to group and then get the maximum value, second largest value and total for each group output. On the example data set mtcars it looks like this:

df <- mtcars %>% group_by(cyl) %>%
  dplyr::summarise(
    MaxVal = max(hp, na.rm = T),
    MAXsecVal = max(hp[hp != max(hp)], na.rm = F),
    Sum = sum(hp)
  ) %>%  arrange(cyl)

 cyl MaxVal MAXsecVal   Sum
  <dbl>  <dbl>     <dbl> <dbl>
1     4    113       109   909
2     6    175       123   856
3     8    335       264  2929

and works. But now I would like to access the (max, sum) variables dynamically. The best way would be a character vector like this:

var <- c("hp", "drat", "wt")

which can be run over the existing programming by a loop or something similar. Does anyone have an idea?

Hekmek
  • 3
  • 2
  • Does this answer your question? [Dynamically selecting multiple columns for group\_by](https://stackoverflow.com/questions/68883069/dynamically-selecting-multiple-columns-for-group-by) – user438383 Sep 11 '21 at 13:42

1 Answers1

0

Please find below a solution to your problem (if I understood it correctly!):

library(dplyr)


# Creating a function:

my_function <- function(table, groupingVar, x){
  
  df <- table %>% group_by(get(groupingVar)) %>%
    dplyr::summarise(
      MaxVal = max(get(x), na.rm = T),
      MAXsecVal = max(get(x)[get(x) != max(get(x))], na.rm = F),
      Sum = sum(get(x))
    ) %>%  arrange(get(deparse(substitute(groupingVar))))
  
  return(df)
  
}


# Defining variables:

table <- mtcars                        # select your data.frame/tibble
groupingVar <- "cyl"                   # select the grouping variable
selectedCols <- c("hp", "drat", "wt")  # select the columns to be computed


# Using the function in a loop for:

results <- list()

for (i in seq(selectedCols)){
  
  results[[i]] <- my_function(table = table, 
                              groupingVar = groupingVar, 
                              x=selectedCols[i])
  
  names(results)[i] <- selectedCols[i]
  colnames(results[[i]])[1] <- get(deparse(substitute(groupingVar)))
}
#> `summarise()` ungrouping output (override with `.groups` argument)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> `summarise()` ungrouping output (override with `.groups` argument)


# Results in a list:

results
#> $hp
#> # A tibble: 3 x 4
#>     cyl MaxVal MAXsecVal   Sum
#>   <dbl>  <dbl>     <dbl> <dbl>
#> 1     4    113       109   909
#> 2     6    175       123   856
#> 3     8    335       264  2929
#> 
#> $drat
#> # A tibble: 3 x 4
#>     cyl MaxVal MAXsecVal   Sum
#>   <dbl>  <dbl>     <dbl> <dbl>
#> 1     4   4.93      4.43  44.8
#> 2     6   3.92      3.9   25.1
#> 3     8   4.22      3.73  45.2
#> 
#> $wt
#> # A tibble: 3 x 4
#>     cyl MaxVal MAXsecVal   Sum
#>   <dbl>  <dbl>     <dbl> <dbl>
#> 1     4   3.19      3.15  25.1
#> 2     6   3.46      3.44  21.8
#> 3     8   5.42      5.34  56.0

Created on 2021-09-12 by the reprex package (v0.3.0)

lovalery
  • 4,524
  • 3
  • 14
  • 28
  • Please not that I change "wp" by "wt" because the mtcars$wp column does not exist! It would be nice to make the change in your question. – lovalery Sep 12 '21 at 14:29
  • Thanks a lot. That is exactly what I wanted to have. However, I get an error message: Error: Length of ordering vectors don't match data frame size Any idea? – Hekmek Sep 13 '21 at 12:40
  • Actually it's difficult to answer without seeing your data. Normally, if your data.frame has the same structure as mtcars, it should work. The error message that R returns means that the column corresponding to the "groupingVar" does not have the same number of rows as the other columns in the result table. Maybe try by removing this line of code from the function (be careful to keep the closing parenthesis at the beginning of the line): %>% arrange(get(deparse(substitute(groupingVar)))) – lovalery Sep 13 '21 at 17:30
  • I had the same error with the mtcar data set. By removing the last line of code from the function as you described it worked. Thanks! – Hekmek Sep 15 '21 at 17:38
  • You're welcome. I am very happy that it works and wish you all the best. FYI, I have redone the tests and both versions of the function still work on my side... the difference is probably related to different environments and/or a different version of the dplyr package – lovalery Sep 15 '21 at 21:23