I am using this code to get difference in hours from two POSIXct dates.
x <- transform(x, HRS = ceiling(as.numeric(SHIP_DATE-PICK_DATE)))
This gives accurate results. However, when I tried to find the hour differences for another similar column, I needed to do this:
x <- transform(x, HRS_ADJ = ceiling(as.numeric(SHIP_DATE-ADJ_PICK_DATE)/60))
PICK_DATE & SHIP_DATE are extracted using the same formula.
x$SHIP_DATE <- ifelse(is.na(as.POSIXct(x$SHIP_DATE, format="%d-%b-%Y %H:%M %p")),
yes = as.POSIXct(x$SHIP_DATE, format="%d-%b-%Y %H:%M"),
no = as.POSIXct(x$SHIP_DATE, format="%d-%b-%Y %H:%M %p"))
x$SHIP_DATE <- as.POSIXct(x$SHIP_DATE, origin = "1970-01-01")
ADJ_PICK_DATE is computed as below:
x$ADJ_PICK_DATE <- ifelse(x$PICK_TIME=="EARLY",
as.POSIXct(paste(format(x$PICK_DATE, "%d-%b-%Y"), "03:00"),
format="%d-%b-%Y %H:%M"), x$PICK_DATE)
x$ADJ_PICK_DATE <- ifelse(x$PICK_TIME=="LATE",
as.POSIXct(paste(format(x$PICK_DATE+86400, "%d-%b-%Y"),
"03:00"), format="%d-%b-%Y %H:%M"),
x$ADJ_PICK_DATE)
x$ADJ_PICK_DATE <- as.POSIXct(x$ADJ_PICK_DATE, origin = "1970-01-01")
PICK_TIME is computed to adjust the PICK_DATE, as for any orders between 16:00 & 03:00, the lead time is to be calculated from 3AM.
Questions:
- How to efficiently generate the ADJ_PICK_DATE column (now it is too slow)?
- How to extract the source data into POSIXct using shorter and more efficient code? (It takes about 10-15 seconds per million data point on my i7 7th Gen CPU)
- Why did I need to use different formula for each pair of dates to calculate the no of days?
Sample data (The dates is formatted randomly in the source (PICK_DATE & SHIP_DATE) as both "DD-MMM-YYYY HH:mm" and "DD-MMM-YYYY hh:mm AM/PM"):
PICK_DATE SHIP_DATE PICK_TIME
01-APR-2017 00:51 02-APR-2017 06:55 EARLY
01-APR-2017 00:51 02-APR-2017 12:11 PM EARLY
01-APR-2017 07:51 02-APR-2017 12:11 PM OKAY
01-APR-2017 02:51 PM 02-APR-2017 09:39 AM LATE