0

I have a dataframe with a column containing a column of type character describing working time. I want to change that character vector to a numeric value in order to make plots and infer things about it. One difficulty lies in having two different formats:

I either have 01.09:55:00 meaning 1 day, 9 hours, 55 minutes and 0 seconds or if one full day isn't reached, 04:30:00 meaning 4 hours, 30 minutes and 0 seconds. If it makes it easier, more than 100 days will not be reached. The hours, minutes, seconds are always there, but not the number of days and the seperation is different.

I can already handle the second format using this to get the time in minutes:

MyData$Working_Time_Total = strptime(MyData$Working_Time_Total, format = "%H:%M:%S") MyData$Working_Time_Total = (MyData$Working_Time_Total$sec + MyData$Working_Time_Total$min * 60 + MyData$Working_Time_Total$hour * 3600) / 60

I tried to attempt some splitting procedure that seperates on the '.' and later recombines it somehow appropriately, however that leads to issues if there is no '.'. I appreciate any help on this issue to get my desired result, which is a numeric vector that contains the time in minutes.

  • Hi. Welcome to SO. Could you dput part of your dates column? See also: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Lennyy Jun 15 '18 at 13:03

2 Answers2

1

This is what I would do with help of the as.ITime() function from the data.table package:

times <- c("04:30:00", "01.09:55:00")

library(data.table)
cols <- c("days", "hms")
as.data.table(times)[
  times %like% "[.]", (cols) := tstrsplit(times, "[.]")][
  is.na(days), (cols) := .( "0", times)][
    , as.integer(days) * 60 * 24 + as.integer(as.ITime(hms, "%H:%M:%S")) / 60][]
[1]  270 2035

Benchmark

# create benchmark data
times0 <- CJ(c("", sprintf("%02i.", 1:99)), 1:24, 1:60)[, sprintf("%s%02i:%02i:00", V1, V2, V3)]

# run benchmarks
microbenchmark::microbenchmark(
  apitsch = {
    times <- copy(times0)
    for (i in 1:length(times)){
      # for format without days:
      if (nchar(times[i]) == 8){
        tmp <- as.numeric(unlist(strsplit(times[i], split = ":")))
        times[i] <- tmp[1] * 60 + tmp[2] + tmp[3] * 1/60 
      } else { # for format including days:
        tmp <- c(unlist(strsplit(times[i], split = "[.]")))
        tmp <- c(tmp[1], unlist(strsplit(tmp[2], split = ":")))
        tmp <- as.numeric(tmp)
        times[i] <- tmp[1] * 24 * 60 + tmp[2] * 60 + tmp[3] + tmp[4] * 1/60
      }
    }
    times
  },
  uwe = {
    times <- copy(times0)
    cols <- c("days", "hms")
    as.data.table(times)[
      times %like% "[.]", (cols) := tstrsplit(times, "[.]")][
        is.na(days), (cols) := .( "0", times)][
          , as.integer(days) * 60 * 24 + as.integer(as.ITime(hms, "%H:%M:%S")) / 60][]
  },
  times = 11L
  )
Unit: milliseconds
    expr       min        lq      mean    median        uq      max neval cld
 apitsch 3485.6488 3561.5639 3708.8017 3631.2264 3747.1996 4288.368    11   b
     uwe  493.0976  497.6782  582.6732  540.5967  643.0875  773.587    11  a
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
0

My solution might be a little bit of a hack, but it works. However, it is not vectorized and may be slow for big dataframes.

times <- c("04:30:00", "01.09:55:00")

# loop over "times" // rows of a column vector
for (i in 1:length(times)){
  # for format without days:
  if (nchar(times[i]) == 8){
    tmp <- as.numeric(unlist(strsplit(times[i], split = ":")))
    times[i] <- tmp[1] * 60 + tmp[2] + tmp[3] * 1/60 
  } else { # for format including days:
    tmp <- c(unlist(strsplit(times[i], split = "[.]")))
    tmp <- c(tmp[1], unlist(strsplit(tmp[2], split = ":")))
    tmp <- as.numeric(tmp)
    times[i] <- tmp[1] * 24 * 60 + tmp[2] * 60 + tmp[3] + tmp[4] * 1/60
  }
}

print(as.numeric(times))
# [1] 270  2035
apitsch
  • 1,532
  • 14
  • 31