0

I want to convert the column time to be in time decimal format and then find the time interval within each group of the user_id. I have tried the answer below, but I could not get it to work:

Days difference between two dates in same column in R

structure(list(question_id = c(5502L, 5502L, 5502L, 5502L, 5502L
), user_id = c(112197L, 112197L, 112197L, 114033L, 114033L), 
    time = structure(c(1603720173, 1603720388, 1603720702, 1603603115, 
    1603949442), class = c("POSIXct", "POSIXt"), tzone = ""), 
    prediction = c(0.9, 0.95, 0.9, 0.99, 0.94), log_score = c(0.84799690655495, 
    0.925999418556223, 0.84799690655495, 0.985500430304885, 0.910732661902913
    )), row.names = 156182:156186, class = "data.frame")
crestor
  • 1,388
  • 8
  • 21
m_zhang
  • 1
  • 2

2 Answers2

1

Perhaps this is what you're looking for?

library(dplyr)
user_data %>%
   group_by(user_id) %>%
   summarise(day.interval = difftime(max(time), min(time),units = "days"))
# A tibble: 2 x 2
  user_id day.interval    
    <int> <drtn>          
1  112197 0.006122685 days
2  114033 4.008414352 days
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
1
library(tidyverse)
library(lubridate)

df <- tibble::tribble(
  ~question_id, ~user_id, ~time, ~prediction, ~log_score,
  5502L,  112197L, "2020-10-26 14:49:33",         0.9,  0.84799690655495,
  5502L,  112197L, "2020-10-26 14:53:08",        0.95, 0.925999418556223,
  5502L,  112197L, "2020-10-26 14:58:22",         0.9,  0.84799690655495,
  5502L,  114033L, "2020-10-25 06:18:35",        0.99, 0.985500430304885,
  5502L,  114033L, "2020-10-29 06:30:42",        0.94, 0.910732661902913
)

df %>%
  as_tibble() %>%
  mutate(time = lubridate::ymd_hms(time)) %>%
  group_by(user_id) %>%
  mutate(diff = time - lag(time),
         diff2 = hms::hms(seconds_to_period(diff)))
#> # A tibble: 5 x 7
#> # Groups:   user_id [2]
#>   question_id user_id time                prediction log_score diff        diff2   
#>         <int>   <int> <dttm>                   <dbl>     <dbl> <drtn>      <time>  
#> 1        5502  112197 2020-10-26 14:49:33       0.9      0.848     NA secs       NA
#> 2        5502  112197 2020-10-26 14:53:08       0.95     0.926    215 secs 00:03:35
#> 3        5502  112197 2020-10-26 14:58:22       0.9      0.848    314 secs 00:05:14
#> 4        5502  114033 2020-10-25 06:18:35       0.99     0.986     NA secs       NA
#> 5        5502  114033 2020-10-29 06:30:42       0.94     0.911 346327 secs 96:12:07
crestor
  • 1,388
  • 8
  • 21
  • Hmm, that's a good alternative interpretation of what the question author wanted. Hopefully they will let us know which they ultimately wanted. – Ian Campbell Apr 12 '21 at 17:43
  • I think this code would help me find the correct time interval if I can convert each time to time decimal first and then find the time interval between each time. For example, if "2020-10-26 14:49:33" then the time decimal will be 1.200833333 and the second time "2020-10-26 14:53:08" will be 1.260555556. So the interval would be 0.05972222222. How can I do this correctly? – m_zhang Apr 13 '21 at 02:40
  • If you want to represent durations as decimal parts of an hour, you can convert the `diff` column: `diff3 = as.numeric(diff / 3600)`. – crestor Apr 13 '21 at 03:49