Two suggestions here: convert all decimal-days to HH:MM:SS.SSS
; or convert all timestamps to decimal days.
Convert all to HH:MM:SS
We can use this function num2time
to convert decimal values to times, assuming decimal is "decimal days" (so 0.25
is a quarter way through the day, or 06:00:00
).
num2time <- function(x, digits.secs = getOption("digits.secs", 3)) {
hr <- as.integer(x %/% 3600)
min <- as.integer((x - 3600*hr) %/% 60)
sec <- (x - 3600*hr - 60*min)
if (anyNA(digits.secs)) {
# a mostly-arbitrary determination of significant digits,
# motivated by @Roland https://stackoverflow.com/a/27767973
for (digits.secs in 1:6) {
if (any(abs(signif(sec, digits.secs) - sec) > (10^(-3 - digits.secs)))) next
digits.secs <- digits.secs - 1L
break
}
}
sec <- sprintf(paste0("%02.", digits.secs[[1]], "f"), sec)
sec <- paste0(ifelse(grepl("^[0-9]\\.", sec), "0", ""), sec)
out <- sprintf("%02i:%02i:%s", hr, min, sec)
out[is.na(x)] <- NA_character_
out
}
With this,
nocolon <- !grepl(":", datos_texto)
datos_texto[nocolon] <- num2time(as.numeric(datos_texto[nocolon]) * 86400)
datos_texto
# [1] "05:59:28.000" "07:19:52" "02:57:46.667" "10:45:30" "13:37:45.803"
This can then be handled the same, whether retaining as a character string or converting into a "timestamp" (without date component) with something like
lubridate::hms(datos_texto)
# [1] "5H 59M 28S" "7H 19M 52S" "2H 57M 46.667S" "10H 45M 30S" "13H 37M 45.803S"
hms::parse_hms(datos_texto)
# 05:59:28.000
# 07:19:52.000
# 02:57:46.667
# 10:45:30.000
# 13:37:45.803
str(hms::parse_hms(datos_texto))
# 'hms' num [1:5] 05:59:28.000 07:19:52.000 02:57:46.667 10:45:30.000 ...
# - attr(*, "units")= chr "secs"
since in that format, numerical operations (plus, minus, difference, etc) are clearly defined.
Convert all to decimal-days
Another option is to convert the time-like fields to numeric.
time2num <- function(x) {
vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * 60^((length(y)-1):0)),
numeric(1), USE.NAMES=FALSE)
}
With this,
out <- numeric(length(datos_texto))
nocolon <- !grepl(":", datos_texto)
out[nocolon] <- as.numeric(datos_texto[nocolon])
out[!nocolon] <- time2num(datos_texto[!nocolon]) / 86400
out
# [1] 0.2496296 0.3054630 0.1234568 0.4482639 0.5678912
and now out
is numeric
as decimal days for all of datos_texto
.
Incidentally, one might be tempted to do datos_texto[nocolon] <- as.numeric(datos_texto[nocolon])
. Realize that datos_texto
, unless all of it is replaced all at once, will remain character
, so the results of as.numeric
are lost. It is definitely possible to convert the :
-containing strings with time2num
in-place, but they will be converted to strings, so you'll end up with:
datos_texto[!nocolon] <- time2num(datos_texto[!nocolon]) / 86400
datos_texto
# [1] "0.24962962962963" "0.305462962962963" "0.123456793981481" "0.448263888888889" "0.567891238425926"
This generally comes up with the same result, but time2num
converts to a floating-point numeric
, and then replacing it into subsets of datos_texto
results in it being converted to string representations of the floating-point numbers. This is easily converted again as
as.numeric(datos_texto)
# [1] 0.2496296 0.3054630 0.1234568 0.4482639 0.5678912
but converting to number then string then number is inefficient (and R is relatively inefficient with large amounts of strings, google R global string pool
, visit Object size for characters in R - How does R global string pool work? and https://adv-r.hadley.nz/names-values.html, and put your learning-cap on). This also works, but I recommend and prefer the use of a numeric
-vector for this.