0

I have a data set that has start date and end date. Some of the end dates are missing. As you can see below, I have tried three different approaches and none of them is working.

startDay <- as.Date(c("2015-01-01","2015-03-01","2016-07-15","2016-08-02"), "%Y-%m-%d")
endDay <- as.Date(c("2018-01-01",NA,"2018-03-05",NA), "%Y-%m-%d")
id <- 1:4
dt <- data.frame(id, startDay, endDay)
dt
str(dt)

dt$caseDay <- as.Date("2018-07-20", "%Y-%m-%d")  
str(dt)
dt

This one changes the class of my variable from date to numeric:

dt$EndDay1 <-
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay)
str(dt)
dt

This one generates an error message.

dt$EndDay2 <-as.Date(
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay), "%Y-%m-%d")
str(dt)
dt

If my research/understanding of related posts is correct, version 3 below should resolve the problem. However, this converted everything to missing values.

dt$EndDay3 <-as.Date(as.character(
ifelse(is.na(dt$endDay), dt$caseDay, dt$endDay)), "%Y-%m-%d")
str(dt)
dt

Any suggestion on how to solve this? Thanks

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
TCS
  • 127
  • 1
  • 11

1 Answers1

6

Here's another idea:

library(dplyr)
library(lubridate)

We'll use lubridate::ymd and dplyr::case_when (see this lubridate cheat sheet for more goodies).

Your data:

dt <- tibble(
  startDay = ymd(c("2015-01-01", "2015-03-01", "2016-07-15", "2016-08-02")),
  endDay = ymd(c("2018-01-01", NA, "2018-03-05", NA))
)

The caseDay:

caseDay <- ymd("2018-07-20")

Use case_when:

dt <- dt %>%
  mutate(endDay = case_when(is.na(endDay) ~ caseDay,
                            TRUE ~ endDay))

(Note: case TRUE is like "default" if none of the cases flagged)

Result:

> dt
# A tibble: 4 x 2
  startDay   endDay    
  <date>     <date>    
1 2015-01-01 2018-01-01
2 2015-03-01 2018-07-20
3 2016-07-15 2018-03-05
4 2016-08-02 2018-07-20
Marian Minar
  • 1,344
  • 10
  • 25
  • Thanks! Someone posted and deleted another simple solution. I have included it below in case other people find it useful: dt$EndDay1 <- dt$endDay dt$EndDay1[is.na(dt$endDay)] <- dt$caseDay[is.na(dt$endDay)] str(dt) dt – TCS Dec 22 '18 at 19:52
  • @TCS nice, post it as a solution (you are allowed to do this :-) ) – Marian Minar Dec 23 '18 at 06:37