Using mtcars
which is built into R we insert some NA's.
Now using the formula method of aggregate
set na.action=na.pass
in aggregate
to prevent it from automatically removing NA's. Then use the indicated Sum
function.
Note that the output of aggregate
, a2
, will have two columns where the second column is itself a two column matrix. If we want three columns use a3 <- do.call("data.frame", a2)
as shown below.
mtcars$mpg[1:3] <- NA # insert some NA's
Sum <- function(x) c(sum1 = sum(x, na.rm = FALSE), sum2 = sum(x, na.rm = TRUE))
a2 <- aggregate(mpg ~ cyl, mtcars, FUN = Sum, na.action = na.pass); a2
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
str(a2)
## 'data.frame': 3 obs. of 2 variables:
## $ cyl: num 4 6 8
## $ mpg: num [1:3, 1:2] NA NA 211.4 270.5 96.2 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "sum1" "sum2"
a3 <- do.call("data.frame", a2); a3
cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
str(a3)
## 'data.frame': 3 obs. of 3 variables:
## $ cyl : num 4 6 8
## $ mpg.sum1: num NA NA 211
## $ mpg.sum2: num 270.5 96.2 211.4
Using the data.frame method of aggregate
is similar except that na.action
is no longer an argument and NA's are not removed by default.
aggregate(mtcars["mpg"], mtcars["cyl"], Sum)
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
Alternatives
collap
from the collapse package is similar to aggregate but does allow a list of functions. It also supplies fsum
which defaults to removing NA's. summaryBy
in the doBy package also supports a list of functions. dplyr's summarize
uses separate arguments instead of a list and data.table can perform aggregation using its own notation.
library(collapse)
collap(mtcars, mpg ~ cyl, c(sum, fsum), keep.col.order = FALSE)
## cyl sum.mpg fsum.mpg
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
library(doBy)
summaryBy(mpg ~ cyl, mtcars, FUN = c(sum, function(x) sum(x, na.rm = TRUE)),
fun.names = c("sum1", "sum2"))
## cyl mpg.sum1 mpg.sum2
## 1 4 NA 270.5
## 2 6 NA 96.2
## 3 8 211.4 211.4
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarize(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE), .groups = "drop")
## # A tibble: 3 x 3
## cyl sum1 sum2
## <dbl> <dbl> <dbl>
## 1 4 NA 270.
## 2 6 NA 96.2
## 3 8 211. 211.
library(data.table)
as.data.table(mtcars)[, .(sum1 = sum(mpg), sum2 = sum(mpg, na.rm = TRUE)), by = cyl]
## cyl sum1 sum2
## 1: 6 NA 96.2
## 2: 4 NA 270.5
## 3: 8 211.4 211.4