To understand how two aggregate() syntaxes work with data frame containing NA values

Question

Here is an example of data frame.

    x3 <- read.table(text = "  id1 id2 val1 val2
1   a   x    1    9
2   a   x    2    4
3   a   y    3    NA
4   a   y    4    NA
5   b   x    1    NA
6   b   y    4    NA
7   b   x    3    9
8   b   y    2    8", header = TRUE)

aggregate(. ~ id1+id2, data = x3, FUN = mean) returns:

  id1 id2 val1 val2
1   a   x  1.5  6.5
2   b   x  3.0  9.0
3   b   y  2.0  8.0

aggregate(x3[,3:4], by = list(x3$id1, x3$id2), FUN = mean, na.rm = TRUE) returns:

  Group.1 Group.2 val1 val2
1       a       x  1.5  6.5
2       b       x  2.0  9.0
3       a       y  3.5  NaN
4       b       y  3.0  8.0

Two aggregate syntaxes do not return the same amount of rows. What is the reason?

Because in the original df all `val2` corresponding to the combination `id1 == "a", id2 == "y"` are `NA`. That group doesn't show up in the first case but it does in the second. — Rui Barradas, Jul 04 '20 at 17:19
To get the same result using the formula interface: `aggregate(. ~ id1+id2, data = x3, FUN = mean, na.action = na.pass, na.rm = TRUE)`. — Axeman, Jul 04 '20 at 17:35

score 1 · Answer 1 · answered Jul 04 '20 at 17:21

Better use with and complete.cases in the list-method of aggregate, to exclude rows with missings beforehand what you probably attempt.

with(x3[complete.cases(x3), ], aggregate(cbind(val1, val2), by=list(id1, id2), FUN=mean))
#   Group.1 Group.2 val1 val2
# 1       a       x  1.5  6.5
# 2       b       x  3.0  9.0
# 3       b       y  2.0  8.0

To understand how two aggregate() syntaxes work with data frame containing NA values

1 Answers1