1

I have some data that contains messages in a conversation. I need to calculate the response time for someone to message back. I have unique user ID's for both participants, however, when I use the code below, it only calculates the difference for each message in the conversation. I need a way to calculate the total difference between the response and the initial message. (i.e. if someone sends multiple initial messages with no response, I need the time between the first message and the first response.)

    convonlinetest <- convonline %>%
      arrange(conversation_id, created_at) %>%
      group_by(conversation_id) %>%
        filter(n() > 1) %>%
      mutate(timediff = created_at - lag(created_at))

First question on stack, thanks so much for helping in advance!

Edit: Some sample data

    structure(list(conversation_id = c(20000004844375, 20000004844378, 
    20000004913095, 20000004837800, 20000004808210, 20000004808210, 
    20000004837799, 20000004844377, 20000004808210, 20000004846076
    ), user_id = c(-33135869739921264, -33135869739921264, 
    57394627930234816, 
    -33135869739921264, -33135869739921264, -70893327136775872, 
    -33135869739921264, 
    -33135869739921264, -33135869739921264, -33135869739921264), 
    created_at = c("2016-05-31 16:46:27.614", "2016-05-31 16:46:28.387", 
    "2016-07-11 20:20:06.589", "2016-05-27 16:31:05.716", "2016-05-13 
    12:48:25.125", 
    "2016-05-10 18:58:30.396", "2016-05-27 16:31:05.451", "2016-05-31 
    16:46:27.981", 
    "2016-05-19 18:43:02.859", "2016-06-01 13:16:26.753"), course_name = 
    c("acct-2020-30i", 
    "acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i", 
    "acct-2020-30i", "acct-2020-30i", "acct-2020-30i", "acct-2020-30i", 
    "acct-2020-30i")), row.names = c(NA, 10L), class = "data.frame")

EDIT: Solution Found

I'm smacking myself for not remembering the aggregate function, but it worked out nicely. Thought I'd share for anyone in the future.

new <- aggregate(convonline, by=list(convonline$conversation_id,
    convonline$user_id, FUN=min)

final <- new %>%
  mutate(created_at = as.Date(created_at)) %>%
  arrange(conversation_id, created_at) %>%
  group_by(conversation_id) %>%
  mutate(diff = created_at - lag(created_at))
  • Welcome to SO--I think I know how to solve this issue but you need to post some sample data using `dput`. Check out https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Ben G Dec 04 '18 at 20:46
  • Awesome i'll get that posted for sure. – Brayden Ross Dec 05 '18 at 02:31
  • An image doesn't usually work as sample data. Please read the link posted by Ben G. It has good advice for how to provide data that other R users would find useful. – Z.Lin Dec 06 '18 at 06:19
  • Sorry about that, just put some dput data. – Brayden Ross Dec 06 '18 at 17:58

1 Answers1

0

When I ran your code with a line changing the created_at column from a character column to a date-time column I get what I believe is the intended result.

library(lubridate)  # great package for handling dates

data %>%
  mutate(created_at = as_datetime(created_at)) %>% # NEW ROW OF CODE 
  arrange(conversation_id, created_at) %>%
  group_by(conversation_id) %>%
  filter(n() > 1) %>%
  mutate(timediff = created_at - lag(created_at))

# A tibble: 3 x 5
# Groups:   conversation_id [1]
  conversation_id  user_id created_at          course_name   timediff       
            <dbl>    <dbl> <dttm>              <chr>         <time>         
1  20000004808210 -7.09e16 2016-05-10 18:58:30 acct-2020-30i "      NA days"
2  20000004808210 -3.31e16 2016-05-13 12:48:25 acct-2020-30i 2.742995 days  
3  20000004808210 -3.31e16 2016-05-19 18:43:02 acct-2020-30i 6.246270 days 
Ben G
  • 4,148
  • 2
  • 22
  • 42
  • Thank you so much, I think that was what I was missing for sure – Brayden Ross Dec 06 '18 at 21:16
  • I think the only issue I'm having while running it is that if someone sends multiple messages before a response, it calculates the time between the messages which haven't been responded to. I think what I'm trying to obtain is the time between the first response and the first message (potentially when the user_id changes?) – Brayden Ross Dec 06 '18 at 21:20
  • Hmm. I’d have to see the data but you could probably use ‘distinct’ from ‘dplyr’ or ‘unique’ from base. – Ben G Dec 07 '18 at 03:21
  • Thanks so much for the help ben, I've added the solution I found above. You helped me in the right direction! – Brayden Ross Dec 07 '18 at 06:04