Why are some grouping values dropped in aggregate()?

Question

When I aggregate a data frame like below I notice that some of the aggregated by column values are getting dropped

    set.seed(100)
    b <- data.frame(id=sample(1:3, 5, replace=TRUE),
         prop1=sample(c(TRUE,FALSE),5, replace = TRUE),
         prop2= sample(c(TRUE,FALSE,NA), 5, replace= TRUE))

    > b
      id prop1 prop2
    1  3 FALSE  TRUE
    2  1 FALSE    NA
    3  2 FALSE    NA
    4  2 FALSE FALSE
    5  3  TRUE  TRUE
    > aggregate(. ~ id, b, function(x) { length(x[x == TRUE])/length(x)})
      id prop1 prop2
    1  2   0.0     0
    2  3   0.5     1

What happened to id 1 here - why is it dropped ?

Because `prop2` for `id=1` is `NA`. P.S. whenever you use `sample` in codes and ask a question here, please `set.seed` so example is always reproducible exactly — Alexey Ferapontov, Feb 08 '17 at 17:50
why isn't `id=2` not getting dropped, `prop2` for `id=2` is `NA` as well — user3206440, Feb 08 '17 at 21:18

score 0 · Accepted Answer · answered Feb 08 '17 at 20:44

0

If you look at the help of aggregate, you will see that there is a parameter to specify how missing values are treated: na.action. After some trials, I found a seed that recreates your issue ;)

set.seed(3)
b <- data.frame(id=sample(1:6, 10, replace=TRUE),
            prop1=sample(c(TRUE,FALSE),10, replace = TRUE),
            prop2= sample(c(TRUE,FALSE,NA), 10, replace= TRUE))
b

   id prop1 prop2
1   3  TRUE  TRUE
2   6  TRUE    NA
3   4 FALSE FALSE
4   4 FALSE  TRUE
5   4  TRUE    NA
6   3  TRUE    NA
7   2 FALSE FALSE
8   3  TRUE FALSE
9   3  TRUE  TRUE
10  4 FALSE FALSE

So we have this id 6.

This should do the stuff:

aggregate(. ~ id, b, function(x) { sum(x,na.rm=TRUE)/length(x)}, na.action=NULL)

  id prop1 prop2
1  2  0.00  0.00
2  3  1.00  0.50
3  4  0.25  0.25
4  6  1.00  0.00

answered Feb 08 '17 at 20:44

Eric Lecoutre

1,461
16
25

By the way, this is behavior for formula method of aggregate. I found again this initial post where I did learn that with useful details: http://stackoverflow.com/questions/16844613/na-values-and-r-aggregate-function – Eric Lecoutre Feb 08 '17 at 20:45
why wouldn't `aggregate(. ~ id, b, function(x) { length(x[x == TRUE])/length(x)}, na.action=NULL)` give same results ? – user3206440 Feb 08 '17 at 21:19
Oh;; depends on what you want as final result. look at `x[x==TRUE]` when there are some `NA` within `x`. With `na.action=NULL`, all values are passed and treated by `function(x)` so ultimately it depends whether you want to include `NA` in computations or not (hence my `sum(..., na.rm=TRUE)` to avoid counting them) – Eric Lecoutre Feb 09 '17 at 07:07

Why are some grouping values dropped in aggregate()?

1 Answers1