1

I am trying to figure out a way to code in R how to join two dataframes to obtain the following:

  1. The time diference between the *FIRST bad report and the first call a teacher takes to call each student´s parent AFTER that first bad report is recorded

  2. make a count of the numbers of calls a teacher has called the parents of a student AFTER the FIRST bad report for each student.

To Ilustrate here are the tables:

DATE_REPORT STUDENT_ID REPORT_CODE
2021-02-23 10:53:34.565 H-007 UDJ-83
2021-02-25 17:02:29.99 H-007 UDJ-84
2021-02-28 17:02:29.100 H-008 UDJ-85

BAD_REPORTS<-data.frame(
  "DATE_REPORT"=c("2021-02-23 10:53:34.565",
                  "2021-02-25 17:02:29.99",
                  "2021-02-28 17:02:29.100"),
  "STUDENT_ID"=c("H-007",
                 "H-007",
                 "H-008"),
  "REPORT_CODE"=c("UDJ-83",
  "UDJ-84","UDJ-85"))

This table shows the dates of the bad report for FEB by STUDENT_ID

the following table shows the records of the calls the teacher has made the same month

DATE_CAL STUDENT_ID
2021-02-10 12:21:07 H-007
2021-02-23 10:26:08 H-007
2021-02-26 11:09:36 H-007
2021-02-26 11:16:46 H-007
2021-02-27 11:16:46 H-008
CALLS_BY_TEACHER<-data.frame(
  "DATE_CAL"=c("2021-02-10 12:21:07",
            "2021-02-23 10:26:08",
            "2021-02-26 11:09:36",
            "2021-02-26 11:16:46",
            "2021-02-27 11:16:46"),
"STUDENT_ID"=c("H-007",
             "H-007",
             "H-007",
             "H-007",
             "H-008"))

Im looking for a table like this:

STUDENT_ID DATE_FIRST_REPORT FIRST_CALL_AFTER_FIRST_REPORT DIF_TIME REPORT_CODE
H-007 2021-02-23 10:53:34.565 2021-02-26 11:09:36 4336.024 UDJ-83
H-007 2021-02-28 17:02:29.100 NA NA UDJ-85

This would be correct because AFTER the first report of this month the teacher called student H-007 on 2021-02-26 11:09:36 (4336.024 mins after the first bad report)

what am I doing?

I have tried to do a full join and then group by STUDENT_ID and REPORT and I select the min(DATE_REPORT) because I know that it has to be the very first date of report but I am kind of lost when I have to code that the first date of call to be taken into consideration will have to be the one RIGHT AFTER the date of the first bad report for each student... so if someone could link some documentation that would be great THANK YOU SO MUCH FOR YOUR HELP

sidenote: If there is no call on the teacher's behalf after a student's frist report than the diff time would be NA

R_Student
  • 624
  • 2
  • 14
  • A time difference of 180,26 min is longer than 12 days, which seems incorrect. – mt1022 Apr 12 '21 at 02:08
  • Yeah! I have corrected the example thank you for pointing that out. I am new to R @mt1022 thank you for helping me out do you know I way to do this using the library tidyverse? – R_Student Apr 12 '21 at 02:15
  • Non-equi join is a special feature of `data.table`. For tidyverse, a possible way is to join by student_id and then filter manually. See here for an example: https://stackoverflow.com/questions/37289405/dplyr-left-join-by-less-than-greater-than-condition – mt1022 Apr 12 '21 at 02:38

1 Answers1

0

Here is a way using data.table:

library(data.table)

setDT(BAD_REPORTS)
setDT(CALLS_BY_TEACHER)

BAD_REPORTS[, c('idate', 'itime') := IDateTime(DATE_REPORT)]
CALLS_BY_TEACHER[, c('idate', 'itime') := IDateTime(DATE_CAL)]

CALLS_BY_TEACHER[BAD_REPORTS[!duplicated(STUDENT_ID)],
    on = .(STUDENT_ID, idate >= idate, itime > itime), mult = 'first'][, .(
        STUDENT_ID,
        DATE_FIRST_REPORT = DATE_REPORT,
        FIRST_CALL_AFTER_FIRST_REPORT = DATE_CAL,
        DIFF_TIME = difftime(DATE_CAL, DATE_REPORT, units='mins'),
        REPORT_CODE)]

#    STUDENT_ID       DATE_FIRST_REPORT FIRST_CALL_AFTER_FIRST_REPORT     DIFF_TIME REPORT_CODE
# 1:      H-007 2021-02-23 10:53:34.565           2021-02-26 11:09:36 4336.024 mins      UDJ-83
# 2:      H-008 2021-02-28 17:02:29.100                          <NA>       NA mins      UDJ-85
mt1022
  • 16,834
  • 5
  • 48
  • 71
  • your code works 10/10 for my problem thank you for showing me a new way of doing things, the link you provided with about dyplr helped me to create a new solution, basically I do a left join to then select call dates that are greater than the report days, and after that I just pick the last call with the last report by group of IDS, just pointing that out if anyone is interested in an alternative solution, THANK YOU so much @mt1022 it was your help, time and effort that allow me to solve this problem and learn!!!! thanks a milllion! you dont know how happy I am thank so much my friend – R_Student Apr 12 '21 at 03:30