Replace NA with a date from another column and add 23:59:59 with it

Question

I want to replace NA in a df column with date from another column + 23:59:59. Below is the sample data and my code. I'm facing issues in doing this. Any lights here..?

code:

df1[is.na(df1$start_time)] <- as.character(paste(dfs$Date,"23:59:59",sep=" "))

df1:

Date                start_time
5/30/2020                NA
5/30/2020         30/05/2020 07:33:12
5/30/2020                NA
5/30/2020         30/05/2020 09:33:12

Output needed:

  Date                start_time
5/30/2020          30/05/2020 23:59:59
5/30/2020          30/05/2020 07:33:12
5/30/2020          30/05/2020 23:59:59
5/30/2020          30/05/2020 09:33:12

Please follow Stack Overflow guidelines for proper data sharing and provide e.g. `dput(df1)`, read [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) — jay.sf, Jul 02 '20 at 10:39
Is there a way to use dput(df[1:4, c(2, 15)]), so that I can produce the data in minimum with specific headers? — Rajhesh, Jul 02 '20 at 10:52
Why do you have different date format in `Date` column and `start_time` ? — Ronak Shah, Jul 02 '20 at 11:29

score 0 · Answer 1 · answered Jul 02 '20 at 12:17

Using base R changing the Date column to "Date" class and "start_time" to POSIXct.

#Change to Date to date class
df$Date <- as.Date(df$Date, "%m/%d/%Y")
#Change start_time to POSIXct
df$start_time <- as.POSIXct(df$start_time, format = '%d/%m/%Y %T', tz = 'UTC')
#Find out NA indices
inds <- is.na(df$start_time)
#Replace the NA values taking Date and adding time as 23:59:59.
df$start_time[inds] <- as.POSIXct(paste0(df$Date[inds], '23:59:59'), "UTC")
df

#        Date          start_time
#1 2020-05-30 2020-05-30 23:59:59
#2 2020-05-30 2020-05-30 07:33:12
#3 2020-05-30 2020-05-30 23:59:59
#4 2020-05-30 2020-05-30 09:33:12

Similar way using lubridate functions :

library(lubridate)
df$Date <- mdy(df$Date)
df$start_time <- dmy_hms(df$start_time)
inds <- is.na(df$start_time)
df$start_time[inds] <- ymd_hms(paste0(df$Date[inds], '23:59:59'))

data

df <- structure(list(Date = c("5/30/2020", "5/30/2020", "5/30/2020", 
"5/30/2020"), start_time = c(NA, "30/05/2020 07:33:12", NA, "30/05/2020 09:33:12"
)), class = "data.frame", row.names = c(NA, -4L))

score 0 · Answer 2 · answered Jul 02 '20 at 22:37

If you don't mind converting your columns in different formats to proper dates and times, you can use dplyr's coalesce() function along with lubridate's durations:

library(lubridate)
library(dplyr)

df %>%
  mutate(Date       = mdy(Date),
         start_time = coalesce(dmy_hms(start_time), Date + ddays(1) - dseconds(1)))
#>         Date          start_time
#> 1 2020-05-30 2020-05-30 23:59:59
#> 2 2020-05-30 2020-05-30 07:33:12
#> 3 2020-05-30 2020-05-30 23:59:59
#> 4 2020-05-30 2020-05-30 09:33:12

You simply convert the Date column with mdy(), and convert start_time column with dmy_hms() along with replacing all NAs by Date + 1 day - 1 second (i.e. duration of 23:59:59).

Data:

df <- structure(list(Date = c("5/30/2020", "5/30/2020", "5/30/2020", 
  "5/30/2020"), start_time = c(NA, "30/05/2020 07:33:12", NA, "30/05/2020 09:33:12"
  )), class = "data.frame", row.names = c(NA, -4L))

^{Created on 2020-07-03 by the reprex package (v0.3.0)}

Replace NA with a date from another column and add 23:59:59 with it

2 Answers2