4

So we have this behaviour:

any(c(TRUE, FALSE, NA))
#> [1] TRUE
any(c(TRUE, NA))
#> [1] TRUE
any(c(FALSE, NA))
#> [1] NA

Anyone know the rationale for returning NA instead of FALSE? IMO the function should be testing for presence of non-FALSE values, which NA is not.

geotheory
  • 22,624
  • 29
  • 119
  • 196
  • 3
    Because `FALSE|NA` returns NA – akrun Feb 14 '17 at 12:44
  • 4
    From the values section of the help file: "The value returned is TRUE if at least one of the values in x is TRUE, and FALSE if all of the values in x are FALSE (including if there are no values). Otherwise the value is NA." – lmo Feb 14 '17 at 12:45
  • 2
    Use `any(na.omit(c(FALSE, NA)))` to always get TRUE or FALSE. – Axeman Feb 14 '17 at 12:47
  • 2
    We also have `na.rm` parameter in `any`. Set it to `TRUE` to remove `NA`'s. – Ronak Shah Feb 14 '17 at 12:49
  • 2
    For what it's worth, R's behavior is quite sensible if you conscider that NA is an unknown value, and could therefore be TRUE or FALSE. This is why `c(TRUE, NA)` is determined, but `c(FALSE, NA)` is not. – Axeman Feb 14 '17 at 12:51
  • Also see [here](http://stackoverflow.com/a/1535492/4341440). – Axeman Feb 14 '17 at 12:54
  • Because `NA` is an _unknown_ value and, being in a `logical` vector, it _might_ be either `TRUE` or `FALSE`. So, when you ask `any(c(FALSE, NA))`, R can't tell anything: the answer might be any, depending on the actual value of the second element. – nicola Feb 14 '17 at 12:54
  • Background reading: https://en.wikipedia.org/wiki/Three-valued_logic, https://en.wikipedia.org/wiki/Null_(SQL) – Hong Ooi Feb 14 '17 at 12:58
  • The [relevant source code](https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/main/logic.c#L392-L399), called by [do_logic3](https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/main/logic.c#L452), which determines the behavior of these expressions: `any(c(TRUE, NA)); any(c(FALSE, NA)); all(c(FALSE, NA)); all(c(TRUE, NA))`. – nrussell Feb 14 '17 at 12:59

3 Answers3

5

This behavior is explained in the values section of the help file:

The value returned is TRUE if at least one of the values in x is TRUE, and FALSE if all of the values in x are FALSE (including if there are no values). Otherwise the value is NA.

As you note, this seems to differ from the behavior of more commonly used functions such as sum and mean, since the presence of NA values in vector arguments to these functions return NA. This problem in perception is cleared up by joran's answer which refers to the documentation from ?Logic, to requote:

NA is a valid logical object. Where a component of x or y is NA, the result will be NA if the outcome is ambiguous. In other words NA & TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the examples below.

So in the case of ambiguity, for example, the calculation of a mean where the vector contains NA, or NA | FALSE where the missing value might be TRUE, NA will be the output. Whereas in other cases such as any(c(TRUE, NA)) or TRUE | NA, the outcome is unambiguous despite the presence of a missing value. This logic may be clearer in @Floo0's answer and in some of the comments to the question.

Community
  • 1
  • 1
lmo
  • 37,904
  • 9
  • 56
  • 69
5

I might be mistaken but the logic here is:

NA means unknown value. So the question

Is any of value of (FALSE, NA) true?

Is answered with "I dont know" aka NA because NA could be TRUE but it is unknown at the moment you are asking.

Take the question

Is any value of (TRUE, NA) true?

This is answered with TRUE as certainly the first value is TRUE.

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • 1
    This is actually the answer (made a similar comment, whose content is actually what @Axeman said before me). – nicola Feb 14 '17 at 12:56
  • @nicola: Sorry i didn't see your comment... You want me to delete my answer and answer yourself? – Rentrop Feb 14 '17 at 13:00
  • 2
    Not at all. This answer needs to be here, since in my opinion the accepted answer doesn't stress the reasons why this is totally expected. It matters very little who posted it. – nicola Feb 14 '17 at 13:14
4

I would wrap the call in isTRUE, this yields the desired result:

> any(c(FALSE, NA))
[1] NA
> isTRUE(any(c(FALSE, NA)))
[1] FALSE

From the documentation:

‘isTRUE(x)’ is an abbreviation of ‘identical(TRUE, x)’, and so is true if and only if ‘x’ is a length-one logical vector whose only element is ‘TRUE’ and which has no attributes (not even names).

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149