228

I am using the function ifelse() to manipulate a date vector. I expected the result to be of class Date, and was surprised to get a numeric vector instead. Here is an example:

dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05'))
dates <- ifelse(dates == '2011-01-01', dates - 1, dates)
str(dates)

This is especially surprising because performing the operation across the entire vector returns a Date object.

dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04','2011-01-05'))
dates <- dates - 1
str(dates)

Should I be using some other function to operate on Date vectors? If so, what function? If not, how do I force ifelse to return a vector of the same type as the input?

The help page for ifelse indicates that this is a feature, not a bug, but I'm still struggling to find an explanation for what I found to be surprising behavior.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Zach
  • 29,791
  • 35
  • 142
  • 201
  • 13
    There is now a function `if_else()` in the dplyr package that can substitute for `ifelse` while retaining correct classes of Date objects - it's [posted below](http://stackoverflow.com/a/38093096/4470365) as a recent answer. I'm bringing attention to it here as it solves this problem by providing a function that is unit-tested and documented in a CRAN package, unlike many other answers that (as of this comment) were ranked ahead of it. – Sam Firke Aug 26 '16 at 17:15

7 Answers7

191

You may use data.table::fifelse (data.table >= 1.12.3) or dplyr::if_else.


data.table::fifelse

Unlike ifelse, fifelse preserves the type and class of the inputs.

library(data.table)
dates <- fifelse(dates == '2011-01-01', dates - 1, dates)
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

dplyr::if_else

From dplyr 0.5.0 release notes:

[if_else] have stricter semantics that ifelse(): the true and false arguments must be the same type. This gives a less surprising return type, and preserves S3 vectors like dates" .

library(dplyr)
dates <- if_else(dates == '2011-01-01', dates - 1, dates)
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05" 
Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • 4
    Is there a way to have one of the arguments of the `if_else` be NA? I've attempted the logical `NA_` options and nothing is sticking and I do not believe there is an `NA_double_` – roarkz Jun 27 '17 at 13:28
  • 17
    @Zak One possibility is to wrap `NA` in `as.Date`. – Henrik Jul 09 '17 at 19:54
  • There is `NA_real_`, @roarkz. and @Henrik, your comment here solved my problem. – BLT May 07 '18 at 17:51
70

It relates to the documented Value of ifelse:

A vector of the same length and attributes (including dimensions and "class") as test and data values from the values of yes or no. The mode of the answer will be coerced from logical to accommodate first any values taken from yes and then any values taken from no.

Boiled down to its implications, ifelse makes factors lose their levels and Dates lose their class and only their mode ("numeric") is restored. Try this instead:

dates[dates == '2011-01-01'] <- dates[dates == '2011-01-01'] - 1
str(dates)
# Date[1:5], format: "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

You could create a safe.ifelse:

safe.ifelse <- function(cond, yes, no){ class.y <- class(yes)
                                  X <- ifelse(cond, yes, no)
                                  class(X) <- class.y; return(X)}

safe.ifelse(dates == '2011-01-01', dates - 1, dates)
# [1] "2010-12-31" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"

A later note: I see that Hadley has built an if_else into the the magrittr/dplyr/tidyr complex of data-shaping packages that does preserve the class of the consequent.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 42
    Somewhat more elegant version: `safe.ifelse <- function(cond, yes, no) structure(ifelse(cond, yes, no), class = class(yes))` – hadley Jul 13 '11 at 03:43
  • 7
    Nice. Do you see there any reason why that is not the default behavior? – IRTFM Jul 13 '11 at 04:34
  • just be careful what you put in "yes" because I had NA and it didn't work. Probably better to pass the class as a parameter than assuming it is the class of the "yes" condition. – Denis Feb 26 '19 at 14:59
  • 1
    I'm not sure that last comment this means. Just because something has an NA value doesn't mean it cannot have a class. – IRTFM Feb 26 '19 at 18:19
  • @IRTFM I could see a problem with `x <- 1:10; safe.ifelse(x < 5, NA, x)`, the assumption that `class(NA)` is the desired class of the result isn't necessarily correct. – Gregor Thomas May 03 '23 at 15:15
17

DWin's explanation is spot on. I fiddled and fought with this for a while before I realized I could simply force the class after the ifelse statement:

dates <- as.Date(c('2011-01-01','2011-01-02','2011-01-03','2011-01-04','2011-01-05'))
dates <- ifelse(dates=='2011-01-01',dates-1,dates)
str(dates)
class(dates)<- "Date"
str(dates)

At first this felt a little "hackish" to me. But now I just think of it as a small price to pay for the performance returns that I get from ifelse(). Plus it's still a lot more concise than a loop.

JD Long
  • 59,675
  • 58
  • 202
  • 294
  • this (nice, if, yes, hackish) technique seems to also help with the fact that R's `for` statement assigns the *value* of items in `VECTOR` to `NAME`, but not their *class*. – Greg Minshall Jun 12 '19 at 07:29
11

The reason why this won't work is because, ifelse() function converts the values to factors. A nice workaround would be to convert it to characters before evaluating it.

dates <- as.Date(c('2011-01-01','2011-01-02','2011-01-03','2011-01-04','2011-01-05'))
dates_new <- dates - 1
dates <- as.Date(ifelse(dates =='2011-01-01',as.character(dates_new),as.character(dates)))

This wouldn't require any library apart from base R.

6

The suggested method does not work with factor columns. Id like to suggest this improvement:

safe.ifelse <- function(cond, yes, no) {
  class.y <- class(yes)
  if (class.y == "factor") {
    levels.y = levels(yes)
  }
  X <- ifelse(cond,yes,no)
  if (class.y == "factor") {
    X = as.factor(X)
    levels(X) = levels.y
  } else {
    class(X) <- class.y
  }
  return(X)
}

By the way: ifelse sucks... with great power comes great responsibility, i.e. type conversions of 1x1 matrices and/or numerics [when they should be added for example] is ok to me but this type conversion in ifelse is clearly unwanted. I bumped into the very same 'bug' of ifelse multiple times now and it just keeps on stealing my time :-(

FW

Fabian Werner
  • 957
  • 11
  • 19
  • This is the only solution that works for me for factors. – bshor Jan 28 '16 at 18:39
  • I would have thought that the levels to be returned would be the union of the levels of `yes` and `no` and that you would first check to see that they were both factors. You would probably need to convert to character and then rebundle with the "unionized"-levels. – IRTFM Sep 27 '16 at 15:56
5

The answer provided by @fabian-werner is great, but objects can have multiple classes, and "factor" may not necessarily be the first one returned by class(yes), so I suggest this small modification to check all class attributes:

safe.ifelse <- function(cond, yes, no) {
      class.y <- class(yes)
      if ("factor" %in% class.y) {  # Note the small condition change here
        levels.y = levels(yes)
      }
      X <- ifelse(cond,yes,no)
      if ("factor" %in% class.y) {  # Note the small condition change here
        X = as.factor(X)
        levels(X) = levels.y
      } else {
        class(X) <- class.y
      }
      return(X)
    }

I have also submitted a request with the R Development team to add a documented option to have base::ifelse() preserve attributes based on user selection of which attributes to preserve. The request is here: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16609 - It has already been flagged as "WONTFIX" on the grounds that it has always been the way it is now, but I have provided a follow-up argument on why a simple addition might save a lot of R users headaches. Perhaps your "+1" in that bug thread will encourage the R Core team to take a second look.

EDIT: Here's a better version that allows the user to specify which attributes to preserve, either "cond" (default ifelse() behaviour), "yes", the behaviour as per the code above, or "no", for cases where the attributes of the "no" value are better:

safe_ifelse <- function(cond, yes, no, preserved_attributes = "yes") {
    # Capture the user's choice for which attributes to preserve in return value
    preserved           <- switch(EXPR = preserved_attributes, "cond" = cond,
                                                               "yes"  = yes,
                                                               "no"   = no);
    # Preserve the desired values and check if object is a factor
    preserved_class     <- class(preserved);
    preserved_levels    <- levels(preserved);
    preserved_is_factor <- "factor" %in% preserved_class;

    # We have to use base::ifelse() for its vectorized properties
    # If we do our own if() {} else {}, then it will only work on first variable in a list
    return_obj <- ifelse(cond, yes, no);

    # If the object whose attributes we want to retain is a factor
    # Typecast the return object as.factor()
    # Set its levels()
    # Then check to see if it's also one or more classes in addition to "factor"
    # If so, set the classes, which will preserve "factor" too
    if (preserved_is_factor) {
        return_obj          <- as.factor(return_obj);
        levels(return_obj)  <- preserved_levels;
        if (length(preserved_class) > 1) {
          class(return_obj) <- preserved_class;
        }
    }
    # In all cases we want to preserve the class of the chosen object, so set it here
    else {
        class(return_obj)   <- preserved_class;
    }
    return(return_obj);

} # End safe_ifelse function
Mekki MacAulay
  • 1,727
  • 2
  • 12
  • 23
0

Why not use indexing here?

> dates <- as.Date(c('2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05'))
> dates[dates == '2011-01-01'] <- NA
> str(dates)
 Date[1:5], format: NA "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"
sashahafner
  • 435
  • 1
  • 7