4

I have seen a lot of questions relating to formatting times, but none in the particular imported format that I have:

Time <- c(
"22 hours 3 minutes 22 seconds", 
"170 hours 15 minutes 20 seconds", 
"39 seconds", 
"2 days 6 hours 44 minutes 17 seconds", 
"9 hours 54 minutes 36 seconds", 
"357 hours 23 minutes 28 seconds", 
"464 hours 30 minutes 7 seconds", 
"51 seconds", 
"31 hours 39 minutes 2 seconds", 
"355 hours 29 minutes 10 seconds")

Some times contain only "seconds", and others "minutes and seconds", "days, hours, minutes and seconds", "days and seconds", etc. There are also NA values that I need to keep. How can I get this character vector to calculate (i.e., add days, hours, minutes, seconds) numeric total days?

For example:

Time
8.10
19.3
0.68
2.28
48.1
0.00
0.70
0.1
3.2
13.9

Thank you!

EDIT

Old question, but a simple lubridate call does the trick now:

(period_to_seconds(period(time)) / 86400) %>% round(2)

This also does the trick with no packages other than needing %>% for readability:

Time_vec <- mapply(function(tt, to_days) {
  ifelse(grepl(tt, Time), gsub(paste0("^.*?(\\d+) ", tt, ".*$"), "\\1", Time), 0) %>%
    as.numeric() / to_days
    },
  c("day", "hour", "minute", "second"),
  c(1, 24, 1440, 86400)
) %>%
  apply(1, sum) %>% 
  round(2)

In my actual data, only one value was different than the lubridate solution, 0.96 vs 0.97.

Tunn
  • 1,506
  • 16
  • 25

4 Answers4

3

again, without packages and a little regex

Time <- c(
  "22 hours 3 minutes 22 seconds", 
  "170 hours 15 minutes 20 seconds", 
  "39 seconds", 
  "6 hours 44 minutes 17 seconds", 
  "9 hours 54 minutes 36 seconds", 
  "357 hours 23 minutes 28 seconds", 
  "464 hours 30 minutes 7 seconds", 
  "51 seconds", 
  "31 hours 39 minutes 2 seconds", 
  "355 hours 29 minutes 10 seconds")

pat <- '(?:(\\d+) hours )?(?:(\\d+) minutes )?(?:(\\d+) seconds)?'
m <- regexpr(pat, Time, perl = TRUE)

m_st <- attr(m, 'capture.start')
m_ln <- attr(m, 'capture.length')

(mm <- mapply(function(x, y) as.numeric(substr(Time, x, y)),
              data.frame(m_st), data.frame(m_st + m_ln - 1)))

(dd <- setNames(data.frame(mm), c('h','m','s')))
#      h  m  s
# 1   22  3 22
# 2  170 15 20
# 3   NA NA 39
# 4    6 44 17
# 5    9 54 36
# 6  357 23 28
# 7  464 30  7
# 8   NA NA 51
# 9   31 39  2
# 10 355 29 10

round(rowSums(dd / data.frame(h = rep(24, nrow(dd)), m = 24 * 60, s = 24 * 60 * 60),
        na.rm = TRUE), 3)
# [1]  0.919  7.094  0.000  0.281  0.413 14.891 19.354  0.001  1.319 14.812
rawr
  • 20,481
  • 4
  • 44
  • 78
  • Everything works great until I get to the last code `round(rowSums(dd / data.frame(h = rep(24, 10), m = 24 * 60, s = 24 * 60 * 60), na.rm = TRUE), 3)` I get this error `Error in Ops.data.frame(dd, data.frame(h = rep(24, 10), m = 24 * 60, s = 24 * : ‘/’ only defined for equally-sized data frames` – Tunn Jan 29 '16 at 20:49
  • 1
    sorry try replacing `rep(24, 10)` with `rep(24, nrow(dd))` see if that is the problem – rawr Jan 29 '16 at 22:16
3

lubridate is useful here. hms automatically extracts hours, minutes, and seconds (saving you some regex), and time_length converts to days.

> library(lubridate)
> time_length(hms(Time), 'day')
estimate only: convert periods to intervals for accuracy
 [1]  0.9190046  7.0939815         NA  0.2807523  0.4129167 14.8912963 19.3542477         NA
 [9]  1.3187731 14.8119213

However hms fails to parse if there aren't three numbers, so a little pre-scrubbing can help:

> library(stringr)
> Time2 <- sapply(Time, function(x){paste(paste(rep(0, 3 - str_count(x, '[0-9]+')), collapse = ' '), x)})
> time_length(hms(Time2), 'day')
estimate only: convert periods to intervals for accuracy
 [1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
 [8] 5.902778e-04 1.318773e+00 1.481192e+01
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • Sorry, that's from `stringr`. If you want to stick to base R, you could replace it with `length(unlist(gregexpr('[0-9]+', x)))`, which does the same thing, but is longer and harder to understand. I didn't explain that whole expression very well; `sapply` passes each item in `Time` to a function that uses `stringr::str_count` to count the number of separate numbers in it. You need 3, so I subtracted from 3, and used `rep(0` to `paste` on extra 0s in front. There are two `paste`s because if there are two zeros you need to `collapse` so you only get out a single string. – alistaire Jan 29 '16 at 17:48
  • Great! Thank you for the explanation. This however, is what happened when I ran `Time2 <- sapply(Time, function(x){paste(paste(rep(0, 3 - str_count(x, '[0-9]+')), collapse = ' '), x)})` This error: `Time2 <- sapply(Time, function(x){paste(paste(rep(0, 3 - str_count(x, '[0-9]+')), collapse = ' '), x)}) Error in rep(0, 3 - str_count(x, "[0-9]+")) : invalid 'times' argument Called from: paste(rep(0, 3 - str_count(x, "[0-9]+")), collapse = " ") Browse[1]>` Did I miss something? – Tunn Jan 29 '16 at 20:41
  • Oh, I didn't know you had `NA`s in your data. You can't pass `rep` an `NA` argument for how many times to repeat, so wrap the `x` in `str_count` in an `ifelse` to catch the `NA`s: `ifelse(is.na(x), '', x)`. `NA`s will come out to 0 in the end, but you can always put the `NA`s back with `Time3[is.na(Time)] <- NA`, where `Time3` is the result of `time_length( ...`. – alistaire Jan 29 '16 at 21:12
2

I recommend you to install the stringr package. Then do this

library(stringr)
options(digits=7)
returndays <- function(alist){
        val <-length(alist)
        #print(val)
        hr <- vector()
        min <- vector()
        sec <- vector()
        day <- vector()
        for (i in 1:val){
                myinfo <-"([1-9][0-9]{0,2}) hours" 
                hr[i] <-str_match(alist[i],myinfo)[,2]
                myinfo2 <-"([1-9][0-9]{0,2}) minutes" 
                min[i] <-str_match(alist[i],myinfo2)[,2]
                myinfo3 <-"([1-9][0-9]{0,2}) seconds" 
                sec[i] <-str_match(alist[i],myinfo3)[,2]

                h <- as.numeric(hr[i])/24

                m <- as.numeric(min[i])/1440

                s <- as.numeric(sec[i])/86400

               day[i] <- sum(h+m+s,na.rm = TRUE)


        }

        return(day)

}

days <-returndays(Time)

days

[1]  0.9190046  7.0939815  0.0000000  0.2807523  0.4129167 14.8912963 19.3542477  0.0000000  1.3187731
[10] 14.8119213
Luis Candanedo
  • 907
  • 2
  • 9
  • 12
2

lubridate offers the function period() that can conveniently convert hours, minutes, seconds etc. to a perdiod object, which can be easily converted to seconds:

period(days = 3, hours = 10, minutes = 3, seconds = 37)
## [1] "3d 10H 3M 37S"

I use this function to convert your character strings:

to_days <- function(hms_char) {

   # split string
   v <- strsplit(hms_char, " ")[[1]]
   # get numbers
   idx <- seq(1, by = 2, length = length(v)/2)
   nums <- as.list(v[idx])
   # get units and use them as names
   names(nums) <- v[-idx]
   # apply functions, sum and convert to days
   duration <- do.call(period, nums)
   days <- period_to_seconds(duration)/86400

   return(days)
}

It works on a single character string, so you will need to use sapply to convert the complete Time:

sapply(Time, to_days, USE.NAMES = FALSE)
## [1] 9.190046e-01 7.093981e+00 4.513889e-04 2.807523e-01 4.129167e-01 1.489130e+01 1.935425e+01
## [8] 5.902778e-04 1.318773e+00 1.481192e+01
Stibu
  • 15,166
  • 6
  • 57
  • 71