0

Suppose I have some random data:

year = mpg$year

year[mpg$year>2006] = NA

Now I want to aggregate and use the sum function twice, once with na.rm = T and once with na.rm = F. Is there a way to pass the same argument twice, once for the first sum call and once for the second (without using FUN = function(x)). Something like that:

aggregate(year, by = list(mpg$model), plyr::each(sum, sum), na.rm = T, na.rm = F)

pgitti
  • 87
  • 7
  • 1
    I suggest there is too much ambiguity with that: who is to stay that any other arguments you might provide should go with one or the other or both/all functions? – r2evans Aug 30 '21 at 14:04
  • Very related: [Apply several summary functions on several variables by group in one call](https://stackoverflow.com/q/12064202/1422451) – Parfait Aug 30 '21 at 14:38

2 Answers2

2

Using mtcars which is built into R we insert some NA's.

Now using the formula method of aggregate set na.action=na.pass in aggregate to prevent it from automatically removing NA's. Then use the indicated Sum function.

Note that the output of aggregate, a2, will have two columns where the second column is itself a two column matrix. If we want three columns use a3 <- do.call("data.frame", a2) as shown below.

mtcars$mpg[1:3] <-  NA # insert some NA's
Sum <- function(x) c(sum1 = sum(x, na.rm = FALSE), sum2 = sum(x, na.rm = TRUE))
a2 <- aggregate(mpg ~ cyl, mtcars, FUN = Sum, na.action = na.pass); a2
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

str(a2)
## 'data.frame':   3 obs. of  2 variables:
##  $ cyl: num  4 6 8
##  $ mpg: num [1:3, 1:2] NA NA 211.4 270.5 96.2 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "sum1" "sum2"

a3 <- do.call("data.frame", a2); a3
   cyl mpg.sum1 mpg.sum2
##   1   4       NA    270.5
##   2   6       NA     96.2
##   3   8    211.4    211.4

str(a3)
## 'data.frame':   3 obs. of  3 variables:
##  $ cyl     : num  4 6 8
##  $ mpg.sum1: num  NA NA 211
##  $ mpg.sum2: num  270.5 96.2 211.4

Using the data.frame method of aggregate is similar except that na.action is no longer an argument and NA's are not removed by default.

aggregate(mtcars["mpg"], mtcars["cyl"], Sum)
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

Alternatives

collap from the collapse package is similar to aggregate but does allow a list of functions. It also supplies fsum which defaults to removing NA's. summaryBy in the doBy package also supports a list of functions. dplyr's summarize uses separate arguments instead of a list and data.table can perform aggregation using its own notation.

library(collapse)
collap(mtcars, mpg ~ cyl, c(sum, fsum), keep.col.order = FALSE)
##   cyl sum.mpg fsum.mpg
## 1   4      NA    270.5
## 2   6      NA     96.2
## 3   8   211.4    211.4

library(doBy)
summaryBy(mpg ~ cyl, mtcars, FUN = c(sum, function(x) sum(x, na.rm = TRUE)), 
  fun.names = c("sum1", "sum2"))
##   cyl mpg.sum1 mpg.sum2
## 1   4       NA    270.5
## 2   6       NA     96.2
## 3   8    211.4    211.4

library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarize(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE), .groups = "drop")
## # A tibble: 3 x 3
##     cyl  sum1  sum2
##   <dbl> <dbl> <dbl>
## 1     4   NA  270. 
## 2     6   NA   96.2
## 3     8  211. 211. 

library(data.table)
as.data.table(mtcars)[, .(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE)), by = cyl]
##    cyl  sum1  sum2
## 1:   6    NA  96.2
## 2:   4    NA 270.5
## 3:   8 211.4 211.4
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Another option will be to use aggregate in Map or lapply.

x <- mtcars
x$mpg[1:3] <-  NA
Map(aggregate, list(x$mpg), list(list(x$cyl)), "sum", na.rm=c(TRUE, FALSE))
#[[1]]
#  Group.1     x
#1       4 270.5
#2       6  96.2
#3       8 211.4
#
#[[2]]
#  Group.1     x
#1       4    NA
#2       6    NA
#3       8 211.4
lapply(list(T=TRUE, F=FALSE), function(y) aggregate(x$mpg, x["cyl"], sum, na.rm=y))
#$T
#  cyl     x
#1   4 270.5
#2   6  96.2
#3   8 211.4
#
#$F
#  cyl     x
#1   4    NA
#2   6    NA
#3   8 211.4

Or you create a new sum function with different name than na.rm.

Sum <- function(x, Na.rm, ...) sum(x, na.rm = Na.rm)
aggregate(x$mpg, x["cyl"], plyr::each(sum, Sum), na.rm = TRUE, Na.rm = FALSE)
#  cyl x.sum x.Sum
#1   4 270.5    NA
#2   6  96.2    NA
#3   8 211.4 211.4

But personally I would prefer to create a function (but this was not wanted in the question).

aggregate(x$mpg, x["cyl"], function(x) c(T = sum(x, na.rm = TRUE), F = sum(x)))
#  cyl   x.T   x.F
#1   4 270.5    NA
#2   6  96.2    NA
#3   8 211.4 211.4
GKi
  • 37,245
  • 2
  • 26
  • 48