5

I'm having trouble understanding what tapply function does when the FUN argument is null.

The documentation says:

If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.

For example, what does the following example of the documentation do?

ind <- list(c(1, 2, 2), c("A", "A", "B"))
tapply(1:3, ind) #-> the split vector

I don't understand the results:

[1] 1 2 4

Thanks.

Carmellose
  • 4,815
  • 10
  • 38
  • 56
  • 1
    See `interaction(ind)` which generates all combinations of "factor" "levels"; in your example the ouput you get corresponds to "X" matching in `levels(interaction(ind))[c(1, 2, 4)]` according to "INDEX" argument. See, also, `tapply(1:5, list(c(1, 2, 2, 2, 1), c("A", "A", "B", "B", "A")))` that can be seen, more clearly, as`tapply` grouping "X" by "INDEX" – alexis_laz May 23 '16 at 13:17
  • The result of `ix <- tapply(X, INDEX)` does not depend on `X` -- only on `INDEX` -- and, in particular, if `INDEX` is a list then `ix` equals `as.integer(do.call(interaction, INDEX))` – G. Grothendieck May 23 '16 at 14:28

1 Answers1

3

If you run tapply with a specified function (not NULL), say sum, like in help, you'll see that the result is a 2-dimensional array with NA in one cell:

res <- tapply(1:3, ind, sum)
res
   A  B
 1 1 NA
 2 2  3

It means that one combination of factors, namely (1, B), is absent. When FUN is NULL, it returns a vector indices corresponding to all present factor combinations. To check this:

> which(!is.na(res))
[1] 1 2 4

One thing to mention, the specified function can return NA's itself, like in the following toy example:

> f <- function(x){
      if(x[[1]] == 1) return(NA)
      return(sum(x))
  }
> tapply(1:3, ind, f)
   A  B
1 NA NA
2  2  3

So, in general, NA doesn't mean that a factor combination is absent.

Iaroslav Domin
  • 2,698
  • 10
  • 19