0

Like base::ifelse, dplyr::if_else calculates the result for both "yes" and "no" (see also thread 2 and thread 3). This can give a pesky warning that I would like to avoid exactly with this conditional approach: I am trying to parse conditionally depending on the data entry format (in this case: a date).

But how can I avoid those warnings? I don't want suppressWarnings, because I still want to have warnings when the parsing fully failed (I am including an example where this is the case)

library(lubridate)
library(dplyr)
x <- c("20/01/2001", "02/28/01", "2000/01/01")

# Both if_else and case_when evaluate for all conditions
if_else(nchar(x) > 8, dmy(x), mdy(x))
#> Warning: 2 failed to parse.

#> Warning: 2 failed to parse.
#> [1] "2001-01-20" "2001-02-28" NA
case_when(nchar(x) > 8 ~ dmy(x), TRUE ~ mdy(x))
#> Warning: 2 failed to parse.

#> Warning: 2 failed to parse.
#> [1] "2001-01-20" "2001-02-28" NA
tjebo
  • 21,977
  • 7
  • 58
  • 94
  • 1
    Not a direct answer to your question, but given that you seem happy to load `lubridate`, do you have a reason for not using `parse_date_time` here? E.g. `parse_date_time(x, c("dmY", "mdy"))`, if you _don't_ want the last `Ymd` date to parse. Also check the `exact = TRUE` argument if you want to "specifying exact formats and avoiding training and guessing". – Henrik Feb 20 '21 at 12:57
  • @Henrik the reason why I wasn't using `parse_date_time` is very simple - I wasn't aware of it. This is very good for my case and solves my problem at hand. Thank you so much! The general question still kind of remains, but I assume there will be always other solutions depending on the actual problem at hand. – tjebo Feb 20 '21 at 13:05
  • I'd have used as.Date( X, tryFormats=c("%Y-%m-%d", "%Y/%m/%d", "%d/%m/%Y", "%m/%d/%y")) . If you really wanted to do it you need another if nested in there that would look for "-" – CALUM Polwart Feb 20 '21 at 13:56

1 Answers1

2

I think you're suggesting short-circuit logic within ifelse (and dplyr::if_else and data.table::fifelse), and I don't see a way to do it safely across all use-cases. For example, realize that dmy(x) is a single function call with a vector as its argument; implementing short-circuiting would require that the ifelse-replacement function know to subset the x vector and call dmy on it only on the elements that need it. While it might see logical that one might be able to specify the symbol(s) that need to be handled in this way, that starts complicating it a bit.

I think the best way to really do short-circuit-like processing here is a bit manual, controlling the vectorized elements yourself.

out <- rep(Sys.Date()[NA], length(x))
for (fun in list(dmy, mdy)) {
  isna <- is.na(out)
  if (all(!isna)) break
  out[isna] <- fun(x[isna])
}
# Warning:  2 failed to parse.
# Warning:  1 failed to parse.
out
# [1] "2001-01-20" "2001-02-28" NA          

This can be iterated over multiple functions (not just 2), or perhaps arguments for a single function (such as formats to attempt with as.POSIXct or similar. (More than two would be close to dplyr::case_when than ifelse/dplyr::if_else ... which is one of the design benefits of case_when.)

With each pass through the for loop, only those elements that still produce an NA in out are processed in the next step; once non-NA, that element is "safe" and not touched again. Once all of out is non-NA, the loop breaks even if further candidate functions/formats are unused.

This still has the problem where one element of the ifelse is a cumulative calculation, requiring the presence of the whole vector before it. That takes a bit more logic and control, and will preclude short-circuiting of the tests before it executes. (It would help to have the cumulative calc done on the first pass or before the for loop. Without an example, I hope you can see the potential complexity here.)

r2evans
  • 141,215
  • 6
  • 77
  • 149