0

I want to replace NA in a df column with date from another column + 23:59:59. Below is the sample data and my code. I'm facing issues in doing this. Any lights here..?

code:

df1[is.na(df1$start_time)] <- as.character(paste(dfs$Date,"23:59:59",sep=" "))

df1:

Date                start_time
5/30/2020                NA
5/30/2020         30/05/2020 07:33:12
5/30/2020                NA
5/30/2020         30/05/2020 09:33:12

Output needed:

  Date                start_time
5/30/2020          30/05/2020 23:59:59
5/30/2020          30/05/2020 07:33:12
5/30/2020          30/05/2020 23:59:59
5/30/2020          30/05/2020 09:33:12
zx8754
  • 52,746
  • 12
  • 114
  • 209
Rajhesh
  • 21
  • 4
  • Please follow Stack Overflow guidelines for proper data sharing and provide e.g. `dput(df1)`, read [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) – jay.sf Jul 02 '20 at 10:39
  • Is there a way to use dput(df[1:4, c(2, 15)]), so that I can produce the data in minimum with specific headers? – Rajhesh Jul 02 '20 at 10:52
  • 1
    Why do you have different date format in `Date` column and `start_time` ? – Ronak Shah Jul 02 '20 at 11:29

2 Answers2

0

Using base R changing the Date column to "Date" class and "start_time" to POSIXct.

#Change to Date to date class
df$Date <- as.Date(df$Date, "%m/%d/%Y")
#Change start_time to POSIXct
df$start_time <- as.POSIXct(df$start_time, format = '%d/%m/%Y %T', tz = 'UTC')
#Find out NA indices
inds <- is.na(df$start_time)
#Replace the NA values taking Date and adding time as 23:59:59.
df$start_time[inds] <- as.POSIXct(paste0(df$Date[inds], '23:59:59'), "UTC")
df

#        Date          start_time
#1 2020-05-30 2020-05-30 23:59:59
#2 2020-05-30 2020-05-30 07:33:12
#3 2020-05-30 2020-05-30 23:59:59
#4 2020-05-30 2020-05-30 09:33:12

Similar way using lubridate functions :

library(lubridate)
df$Date <- mdy(df$Date)
df$start_time <- dmy_hms(df$start_time)
inds <- is.na(df$start_time)
df$start_time[inds] <- ymd_hms(paste0(df$Date[inds], '23:59:59'))

data

df <- structure(list(Date = c("5/30/2020", "5/30/2020", "5/30/2020", 
"5/30/2020"), start_time = c(NA, "30/05/2020 07:33:12", NA, "30/05/2020 09:33:12"
)), class = "data.frame", row.names = c(NA, -4L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

If you don't mind converting your columns in different formats to proper dates and times, you can use dplyr's coalesce() function along with lubridate's durations:

library(lubridate)
library(dplyr)

df %>%
  mutate(Date       = mdy(Date),
         start_time = coalesce(dmy_hms(start_time), Date + ddays(1) - dseconds(1)))
#>         Date          start_time
#> 1 2020-05-30 2020-05-30 23:59:59
#> 2 2020-05-30 2020-05-30 07:33:12
#> 3 2020-05-30 2020-05-30 23:59:59
#> 4 2020-05-30 2020-05-30 09:33:12

You simply convert the Date column with mdy(), and convert start_time column with dmy_hms() along with replacing all NAs by Date + 1 day - 1 second (i.e. duration of 23:59:59).

Data:

df <- structure(list(Date = c("5/30/2020", "5/30/2020", "5/30/2020", 
  "5/30/2020"), start_time = c(NA, "30/05/2020 07:33:12", NA, "30/05/2020 09:33:12"
  )), class = "data.frame", row.names = c(NA, -4L))

Created on 2020-07-03 by the reprex package (v0.3.0)