Here's a simple data frame with a missing value:
M = data.frame( Name = c('name', 'name'), Col1 = c(NA, 1) , Col2 = c(1, 1))
# Name Col1 Col2
# 1 name NA 1
# 2 name 1 1
When I use aggregate
to sum
variables by group ('Name') using the formula
method:
aggregate(. ~ Name, M, FUN = sum, na.rm = TRUE)
the result is:
# RowName Col1 Col2
# name 1 1
So the entire first row, which have an NA
, is ignored. But if use the "non-formula
" specification:
aggregate(M[, 2:3], by = list(M$Name), FUN = sum, na.rm = TRUE)
the result is:
# Group.1 Col1 Col2
# name 1 2
Here only the (1,1) entry is ignored.
This caused a major debugging headache in one of my code, since I thought these two calls were equivalent. Is there a good reason why the formula
entry method is treated differently?