9

I am using the lubridate and dplyr packages to work with date variables and to create a new date variable, respectively.

library(lubridate)
library(dplyr)

Let df be my dataframe. I have two variables date1 and date2. I want to create a new variable date such that it takes the value of date1. If date1 is missing, value of date2 is taken instead.

df <- data.frame(date1 = c("24/01/2016",NA,"22/07/2016"),
                 date2 = c("31/01/2016","09/02/2017",NA),
                 stringsAsFactors=FALSE)`

The above command gives:

       date1      date2
1 24/01/2016 31/01/2016
2       <NA> 09/02/2017
3 22/07/2016       <NA>

I tried the following which I thought can give me the results desired. However, the new date variables is in numerics.

df %>% 
   mutate_at(vars(date1,date2),dmy) %>% 
   mutate(date=ifelse(is.na(date1),date2,date1))

       date1      date2  date
1 2016-01-24 2016-01-31 16824
2       <NA> 2017-02-09 17206
3 2016-07-22       <NA> 17004

I want:

       date1      date2       date
1 2016-01-24 2016-01-31 2016-01-24
2       <NA> 2017-02-09 2017-02-09
3 2016-07-22       <NA> 2016-07-22

How do I solve this problem?

alistaire
  • 42,459
  • 4
  • 77
  • 117
HNSKD
  • 1,614
  • 2
  • 14
  • 25

1 Answers1

21

Use dplyr::if_else instead of base::ifelse, which, according to ?if_else, is type safer,

Compared to the base ifelse(), this function is more strict. It checks that true and false are the same type. This strictness makes the output type more predictable, and makes it somewhat faster.

df %>% 
      mutate_at(vars(date1,date2),dmy) %>% 
      mutate(date=if_else(is.na(date1),date2,date1))

#       date1      date2       date
#1 2016-01-24 2016-01-31 2016-01-24
#2       <NA> 2017-02-09 2017-02-09
#3 2016-07-22       <NA> 2016-07-22

Another option is to use coalesce, which takes values from date1 if it is not NA otherwise take values from date2:

df %>% 
      mutate_at(vars(date1,date2),dmy) %>% 
      mutate(date = coalesce(date1, date2))

#       date1      date2       date
#1 2016-01-24 2016-01-31 2016-01-24
#2       <NA> 2017-02-09 2017-02-09
#3 2016-07-22       <NA> 2016-07-22

If you want to keep your original code, just wrap as.Date around ifelse, since ifelse stripped off the class of the result and only keeps the underlying data i.e. the number of days since 1970-01-01:

df %>% 
      mutate_at(vars(date1,date2),dmy) %>% 
      mutate(date=as.Date(ifelse(is.na(date1),date2,date1), origin = "1970-01-01"))
Psidom
  • 209,562
  • 33
  • 339
  • 356