1

I need to calculate the timed difference and make a variable 'duration'. The sample data looks like this:

                endTime                   startTime     user_id      categories duration
1      2019-02-22T19:17:02.618  019-02-22T19:16:58.377   10224   communication   0 secs
2      2019-02-23T12:19:01.055 2019-02-23T12:18:44.414   10224   communication   0 secs
3      2019-02-25T21:03:15.771 2019-02-25T21:03:06.961   10224 utility & tools   0 secs
4      2019-02-27T19:22:41.174 2019-02-27T19:22:32.246   10224   communication   0 secs

endTime and startTime are all set to Date format using as.POSIXct. I use the difftime in base R, code looks like this:

dat$duration <- difftime(dat$startTime,dat$endTime)

and all the duration has a value of 0. I don't understand why this is happening. I checked some other libraries too (chron, lubridate) for calculating this, seems that they only accept one string containing both times, instead of two variables. Doesn't seem like wise for me to merge both variables into one string..is there an easier way? Thank you!!

dput:

structure(list(battery = c(47L, 41L, 18L, 94L, 94L, 93L, 73L, 
73L, 47L, 49L), endTime = c("2019-02-22T19:17:02.618", "2019-02-23T12:19:01.055", 
"2019-02-25T21:03:15.771", "2019-02-27T19:22:41.174", "2019-02-27T19:22:53.256", 
"2019-02-27T23:51:16.407", "2019-03-02T20:18:28.090", "2019-03-02T20:18:43.488", 
"2019-03-19T13:07:16.993", "2019-03-19T12:16:36.962"), session = c(1550859371L, 
1550920714L, 1551124876L, 1551291720L, 1551291720L, 1551307871L, 
1551554295L, 1551554295L, 1552997232L, 1552994133L), startTime = c("2019-02-22T19:16:58.377", 
"2019-02-23T12:18:44.414", "2019-02-25T21:03:06.961", "2019-02-27T19:22:32.246", 
"2019-02-27T19:22:45.404", "2019-02-27T23:51:15.270", "2019-03-02T20:18:21.362", 
"2019-03-02T20:18:37.066", "2019-03-19T13:07:15.348", "2019-03-19T12:15:38.440"
), user_id = c(10224L, 10224L, 10224L, 10224L, 10224L, 10224L, 
10224L, 10224L, 10224L, 10224L), categories = structure(c(1L, 
1L, 6L, 1L, 2L, 2L, 6L, 1L, 2L, 1L), .Label = c("communication", 
"games & entertainment", "lifestyle", "news & information outlet", 
"social network", "utility & tools"), class = "factor"), duration = structure(c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), class = "difftime", units = "secs")), row.names = c(NA, 
10L), class = "data.frame")
Cettt
  • 11,460
  • 7
  • 35
  • 58
senera
  • 85
  • 5
  • 1
    Are you certain that the variables are in `Date` format? If they are, most likely `difftime` returns the difference in `days`, which equals zero in your example. – JDG Oct 31 '19 at 14:42
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Share a `dput()` of your data so we can see exactly what's in there. Just seeing how the data prints to the screen isn't enough. – MrFlick Oct 31 '19 at 14:46
  • @JDG Hi, the date is not that important for my analysis, that is why I tried to remove it and only leave the hour/minute/second, so it won't calculate the day difference. However, it did not work...could you please recommend how to set the time into Date format then? This is my first time working with time data... thank you! – senera Oct 31 '19 at 14:47
  • You just pasted it in text format. We cannot do anything with this. Paste the whole `dput()` output, i.e. it looks like `structure(...)` – JDG Oct 31 '19 at 14:52
  • Mostly likely your date columns are character strings or factors and are not date/time objects. If True, you will need to convert using the `as.POSIXct()` function and then `difftime()`. Here is a good example: https://stackoverflow.com/questions/21667212/converting-datetime-string-to-posixct-date-time-format-in-r – Dave2e Oct 31 '19 at 14:53
  • Hi @Dave2e, thank you ! I set the data into Date format using the method you provided. However, it still returns me all 0 for duration.... – senera Oct 31 '19 at 15:04
  • @JDG I have already transformed the data into Date format using s.POSIXct(), but the duration is 0. Please see the updated question for dput. – senera Oct 31 '19 at 15:05

3 Answers3

4

The following works for me. The difftime's are not zero.

op_digs <- options(digits.secs = 3)

dat$endTime <- as.POSIXct(dat$endTime, format = "%Y-%m-%dT%H:%M:%OS")
dat$startTime <- as.POSIXct(dat$startTime, format = "%Y-%m-%dT%H:%M:%OS")
difftime(dat$startTime,dat$endTime)
#Time differences in secs
#[1]  -4.241 -16.641  -8.810  -8.928

dat$duration <- difftime(dat$startTime,dat$endTime)

options(digits.secs = op_digs)

Data.

dat <- read.table(text = "
                endTime                   startTime     user_id      categories duration
1      2019-02-22T19:17:02.618 2019-02-22T19:16:58.377   10224   communication   '0 secs'
2      2019-02-23T12:19:01.055 2019-02-23T12:18:44.414   10224   communication   '0 secs'
3      2019-02-25T21:03:15.771 2019-02-25T21:03:06.961   10224 'utility & tools'   '0 secs'
4      2019-02-27T19:22:41.174 2019-02-27T19:22:32.246   10224   communication   '0 secs'
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thank you! When I ran the exactly same code, it gives me an error when I open to check the dataset, saying 'r error 4 (R code execution error)'. When I removed the last line 'options(digits.secs = op_digs)', it works. But if I remove the first line 'op_digs <- options(digits.secs = 3)' as well, then it only gives me 0. Very strange but I am glad this worked out! – senera Oct 31 '19 at 15:24
2

here is a solution based on dplyr which has a more fluent workflow:

library(dplyr)
dat %>%
  mutate_at(vars(startTime, endTime), ~as.POSIXct(strptime(.x, format = c("%Y-%m-%dT%H:%M:%OS")))) %>%
  mutate(duration = endTime - startTime)

   battery             endTime    session           startTime user_id            categories    duration
1       47 2019-02-22 19:17:02 1550859371 2019-02-22 19:16:58   10224         communication  4.241 secs
2       41 2019-02-23 12:19:01 1550920714 2019-02-23 12:18:44   10224         communication 16.641 secs
3       18 2019-02-25 21:03:15 1551124876 2019-02-25 21:03:06   10224       utility & tools  8.810 secs
4       94 2019-02-27 19:22:41 1551291720 2019-02-27 19:22:32   10224         communication  8.928 secs
5       94 2019-02-27 19:22:53 1551291720 2019-02-27 19:22:45   10224 games & entertainment  7.852 secs
6       93 2019-02-27 23:51:16 1551307871 2019-02-27 23:51:15   10224 games & entertainment  1.137 secs
7       73 2019-03-02 20:18:28 1551554295 2019-03-02 20:18:21   10224       utility & tools  6.728 secs
8       73 2019-03-02 20:18:43 1551554295 2019-03-02 20:18:37   10224         communication  6.422 secs
9       47 2019-03-19 13:07:16 1552997232 2019-03-19 13:07:15   10224 games & entertainment  1.645 secs
10      49 2019-03-19 12:16:36 1552994133 2019-03-19 12:15:38   10224         communication 58.522 secs

Basically, you first convert the columns endTime and startTime to a proper format (POSIXct) and then use simple subtraction.

Cettt
  • 11,460
  • 7
  • 35
  • 58
0

Here is how to solve this with the lubridate and dplyr packages. It will provide the same results as above. Just in a different way

df<-structure(list(battery = c(47L, 41L, 18L, 94L, 94L, 93L, 73L, 73L, 47L, 49L), 
                   endTime = c("2019-02-22T19:17:02.618", "2019-02-23T12:19:01.055", "2019-02-25T21:03:15.771", "2019-02-27T19:22:41.174", "2019-02-27T19:22:53.256", "2019-02-27T23:51:16.407", "2019-03-02T20:18:28.090", "2019-03-02T20:18:43.488", "2019-03-19T13:07:16.993", "2019-03-19T12:16:36.962"), 
                   session = c(1550859371L,1550920714L, 1551124876L, 1551291720L, 1551291720L, 1551307871L,1551554295L, 1551554295L, 1552997232L, 1552994133L),
                   startTime = c("2019-02-22T19:16:58.377",  "2019-02-23T12:18:44.414", "2019-02-25T21:03:06.961", "2019-02-27T19:22:32.246", "2019-02-27T19:22:45.404", "2019-02-27T23:51:15.270", "2019-03-02T20:18:21.362", "2019-03-02T20:18:37.066", "2019-03-19T13:07:15.348", "2019-03-19T12:15:38.440"),
                   user_id = c(10224L, 10224L, 10224L, 10224L, 10224L, 10224L, 10224L, 10224L, 10224L, 10224L), 
                   categories = structure(c(1L, 1L, 6L, 1L, 2L, 2L, 6L, 1L, 2L, 1L), .Label = c("communication", "games & entertainment", "lifestyle", "news & information outlet", "social network", "utility & tools"), class = "factor"), 
                   duration = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), class = "difftime", units = "secs")), row.names = c(NA, 10L), class = "data.frame")

library(dplyr)
library(lubridate)
df<-df %>%
  mutate(endTime = ymd_hms(endTime), startTime = ymd_hms(startTime)) %>%
  mutate(duration = endTime - startTime)
df2

this is the output

   battery             endTime    session           startTime user_id            categories    duration
1       47 2019-02-22 19:17:02 1550859371 2019-02-22 19:16:58   10224         communication  4.241 secs
2       41 2019-02-23 12:19:01 1550920714 2019-02-23 12:18:44   10224         communication 16.641 secs
3       18 2019-02-25 21:03:15 1551124876 2019-02-25 21:03:06   10224       utility & tools  8.810 secs
4       94 2019-02-27 19:22:41 1551291720 2019-02-27 19:22:32   10224         communication  8.928 secs
5       94 2019-02-27 19:22:53 1551291720 2019-02-27 19:22:45   10224 games & entertainment  7.852 secs
6       93 2019-02-27 23:51:16 1551307871 2019-02-27 23:51:15   10224 games & entertainment  1.137 secs
7       73 2019-03-02 20:18:28 1551554295 2019-03-02 20:18:21   10224       utility & tools  6.728 secs
8       73 2019-03-02 20:18:43 1551554295 2019-03-02 20:18:37   10224         communication  6.422 secs
9       47 2019-03-19 13:07:16 1552997232 2019-03-19 13:07:15   10224 games & entertainment  1.645 secs
10      49 2019-03-19 12:16:36 1552994133 2019-03-19 12:15:38   10224         communication 58.522 secs
Michael Vine
  • 335
  • 1
  • 9