1

In R, there are the operands & and && (alternatively, | and ||), where & is vectorized, and && is not. Another difference is that & always evaluates all arguments, while && short-circuits (sometimes called lazy evaluation), meaning F && fun(x) won't call fun(x).

What I'm looking for is way to combine those two, that I can call for example

input <- data.frame(valid=c(T,T,T,F,F), value=c('1','2','3','huh',14), stringsAsFactors = F)
# A function to check evenness, but who prints an alert if the value is more then 10
fun <- function(x) {
  if(any(as.numeric(x)>10))
    cat(as.numeric(x)[as.numeric(x)>10], '')
  return(as.numeric(x) %% 2==0)
}
cat("Numbers over 10 (unexpected):\n")
pass <- input$valid & fun(input$value)
cat("\nAnd in total we have",sum(pass),"even numbers") 

Here, I get warnings, because 'huh' can't be casted to numeric, even though 'huh' is never needed to execute the function.

What I'd like is behaviour similar to this:

pass2 <- rep(FALSE, nrow(input))
cat("Numbers over 10 (unexpected):\n")
for(n in 1:nrow(input)) {
  if(input$valid[n]) pass2[n] <- fun(input$value[n])
}
cat("\nAnd in total we have",sum(pass2),"even (valid) numbers")

In this example, it would be easy to adapt fun, or to write around it, but in my daily work I often find use cases with more difficult conditionals, and various functions that I don't want to adapt every time.

Is there any way to do what I want to do, or do I really need to return to non-vectorised functions and/or for-loops?


Some approaches I tried myself, but didn't work out: mapply:

mapply(`&&`, input$valid, fun(input$value))

But fun is still evaluated. It's interesting to note that the returned value IS ignored when necessary by the && if you compare the following:

mapply(`&&`, c(F,F), c(T, print('Huh?')))
mapply(`&&`, c(T,T), c(T, print('Huh?')))
mapply(`&`, c(F,F), c(T, print('Huh?')))

But in all cases the print is evaluated, I guess the mapply forces evaluation.

I also tried this:

`%&%` <- function(a,b) {
  res <- rep(FALSE, times=length(a))
  res[is.na(a)|a] <- a[is.na(a)|a] & b[is.na(a)|a]
}
input$valid %&% fun(input$value)

thinking I'd only use b's values if a was non-false. But it looks like almost the same thing is happening here: b is evaluated first, only then subsetted... (yes, I know I should check the lengths too, I was trying this because maybe the length-checking was forcing evaluation)

Emil Bode
  • 1,784
  • 8
  • 16
  • You might consider including a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) in your question. That will make it a easier for others to help you. – Jaap May 07 '18 at 10:24

2 Answers2

1

What you can do is make a contructor of a new function that handle NA as FALSE:

bool_noNA <- function(fun) {
  function(x, valid) {
    if (missing(valid)) valid <- !is.na(x)
    res  <- logical(length(x))
    res[valid] <- fun(x[valid])
    res
  }
}

An example of use:

is_odd <- function(x) x %% 2 == 1    
is_odd(c(3:5, NA))

is_odd_noNA <- bool_noNA(is_odd)
is_odd_noNA(c(3:5, NA))
is_odd_noNA(c(3:5, NA), valid = c(T, F, F, F))
is_odd_noNA(c(3:5, NA), valid = c(F, T, F, F))
F. Privé
  • 11,423
  • 2
  • 27
  • 78
  • Thanks, I think your answer helps! My question wasn't specifically for getting rid of NA's (and I've since updated my question), but I think the idea of not passing on fun(x), but instead only evaluating fun(x[Conditional]) could work. I'll try to see if I can come up with a more general solution. – Emil Bode May 07 '18 at 12:06
0

Based on F. Privé's answer, I've found a more general solution:

LazyAnd <- function(a,b, fun, ...) {
  a[is.na(a)|a] <- a[is.na(a)|a] & fun(b[is.na(a)|a], ...)
  return(a)
}

The disadvantage is that it's no longer an infix operator, which might make things more cluttered. But I don't think it's possible to call fun(x) without evaluating all results of fun(x), even though that is what I'd want. Because something that occurred to me was that otherwise functions such as cumsum would be poorly defined, as cumsum(c(1:100)[10:20]) of course gives different results than cumsum(c(1:100))[10:20]

And finally, if people want to reuse my code, here's the lazy, vectorized or as well:

LazyOr <- function(a,b, fun, ...) {
  a[is.na(a)|!a] <- a[is.na(a)|!a] & fun(b[is.na(a)|!a], ...)
  return(a)
}
Emil Bode
  • 1,784
  • 8
  • 16