-1

I have a data where the time/duration appears in a string format- "1 day, 4 hours, 58 minutes, 52 seconds", "1 week, 1 day, 20 hours, 30 minutes, 49 seconds", etc. How can I convert the duration so it appears as the number of days? The problem is that some rows only have seconds, some minutes and seconds, etc. Thank you!!

Data sample:

Duration_1=c("43 weeks, 1 day, 18 hours, 59 minutes, 13 seconds", "12 seconds", "33 minutes, 58 seconds", "1 hour, 54 minutes, 3 seconds", "55 minutes, 4 seconds") 
Duration_2=c("55 seconds", "21 hours, 16 minutes, 40 seconds", "2 days, 46 minutes, 55 seconds", "13 hours, 53 minutes, 8 seconds", "15 weeks, 6 days, 5 hours, 37 minutes, 6 seconds") 
Duration=data.frame(Duration_1,Duration_2) 
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you – Sotos Jul 23 '18 at 09:16
  • What do you intend to do with the result? What you show there are durations. Is the expected result 1.207546 days for your first example? – Roland Jul 23 '18 at 09:16
  • I am trying to find the mean duration. The result can be in days or hours. – Eliel Epelbaum Jul 23 '18 at 09:27
  • Please provide your input data in a way that allows easy import to an R session. See this FAQ for best practices: https://stackoverflow.com/a/5963610/1412059 – Roland Jul 23 '18 at 09:29
  • Duration_1=c("43 weeks, 1 day, 18 hours, 59 minutes, 13 seconds", "12 seconds", "33 minutes, 58 seconds", "1 hour, 54 minutes, 3 seconds", "55 minutes, 4 seconds") Duration_2=c("55 seconds", "21 hours, 16 minutes, 40 seconds", "2 days, 46 minutes, 55 seconds", "13 hours, 53 minutes, 8 seconds", "15 weeks, 6 days, 5 hours, 37 minutes, 6 seconds") Duration=data.frame(Duration_1,Duration_2) – Eliel Epelbaum Jul 23 '18 at 10:01

2 Answers2

1

Well, you need to write a parser with some simple regex:

foo <- function(x) {
  x <- as.character(x)
  pattern <- "\\d+(?= second)" #lookahead regex (digits followed by space+seconds)
  secs <- regmatches(x, gregexpr(pattern, x, perl = TRUE))
  secs[lengths(secs) == 0] <- 0
  secs <- unlist(secs)

  pattern <- "\\d+(?= minute)"
  mins <- regmatches(x, gregexpr(pattern, x, perl = TRUE))
  mins[lengths(mins) == 0] <- 0
  mins <- unlist(mins)

  pattern <- "\\d+(?= hour)"
  hours <- regmatches(x, gregexpr(pattern, x, perl = TRUE))
  hours[lengths(hours) == 0] <- 0
  hours <- unlist(hours)

  pattern <- "\\d+(?= day)"
  days <- regmatches(x, gregexpr(pattern, x, perl = TRUE))
  days[lengths(days) == 0] <- 0
  days <- unlist(days)

  pattern <- "\\d+(?= week)"
  weeks <- regmatches(x, gregexpr(pattern, x, perl = TRUE))
  weeks[lengths(weeks) == 0] <- 0
  weeks <- unlist(weeks)

  tmp <- cbind(weeks, days, hours, mins, secs)
  mode(tmp) <- "numeric"

  mult <- c(7 * 24 * 3600, 24 * 3600, 3600, 60, 1) #result is in seconds
  c(tmp %*% mult)
}

Duration[] <- lapply(Duration, foo)
#Duration_1 Duration_2
#1   26161153         55
#2         12      76600
#3       2038     175615
#4       6843      49988
#5       3304    9610626
Roland
  • 127,288
  • 10
  • 191
  • 288
0

How can I convert the duration so it appears as the number of days?

As an other solution we could utilize difftime, e. g.:

unitnames = c(week="weeks", weeks="weeks", day="days", days="days", hour="hours", hours="hours",
              minute="mins", minutes="mins", second="secs", seconds="secs")
converdays =
function(w)
{ sapply(strsplit(w, ", "),  # for each string, separate the quantities by ", "
         function(x)
           do.call(sum,      # sum up the duration quantities, computed such:
                   lapply(strsplit(x, " "),  # split into magnitude and unit
                          function(y)        # convert to a "difftime" with that unit
                          { z = as.difftime(as.integer(y[1]), units=unitnames[y[2]])
                            units(z)="days"  # change that unit to the desired "days"
                            return(z)
                          }
                         )
                  )
        )
}

converdays(Duration_1)
# [1] 3.027911e+02 1.388889e-04 2.358796e-02 7.920139e-02 3.824074e-02
converdays(Duration_2)
# [1] 6.365741e-04 8.865741e-01 2.032581e+00 5.785648e-01 1.112341e+02

Another variant, should one prefer the output to keep the class difftime in order to be able to easily convert to different units, is:

unitnames = c(week ="weeks", day ="days", hour ="hours", minute ="mins", second ="secs",
              weeks="weeks", days="days", hours="hours", minutes="mins", seconds="secs")
csplit = function(x, s, f) do.call(c, lapply(strsplit(x, s), f))  # helper function to split
convertds = function(w)                  # convert to difftimes
             csplit(w, ", ",             # for each string, separate the quantities by ", "
                    function(x)
                     sum(csplit(x, " ",  # split into magnitude and unit, convert and sum up
                         function(y) as.difftime(as.integer(y[1]), units=unitnames[y[2]]))))
print (convertds(Duration_1) -> d1)
# Time differences in secs
# [1] 26161153       12     2038     6843     3304
units(d1)="days"
d1
# Time differences in days
# [1] 3.027911e+02 1.388889e-04 2.358796e-02 7.920139e-02 3.824074e-02
Armali
  • 18,255
  • 14
  • 57
  • 171