Is there an R function to compare timestamps in two different datasets?

Question

I am new to R and have just started my research work, so please excuse if the answer is obvious. I have tried to find the answer in other questions, but I am not sure if I am using the right terms. Including this similar, but not identical question (R Stats: Comparing timestamps in two dataframes).

For my research question we wanted to measure episodes of heart arrhythmia (atrial fibrillation=afib) in patients. We did this using two different methods: ECG and PPG.

Therefore we have two different dataframes per patient.

ECG:

start               | end                   | type
19.10.2020 11:34:53 | 19.10.2020 11:35:24   | noise   
19.10.2020 22:49:53 | 19.10.2020 22:59:53   | Afib
19.10.2020 23:00:21 | 19.10.2020 23:10:53   | Afib
19.10.2020 23:47:14 | 19.10.2020 23:56:22   | Afib

PPG:

start               | end                   | type
19.10.2020 11:25:53 | 19.10.2020 11:40:24   | noise   
19.10.2020 22:49:53 | 19.10.2020 22:59:53   | Afib
19.10.2020 23:00:21 | 19.10.2020 23:15:53   | Afib
19.10.2020 23:42:04 | 19.10.2020 23:54:38   | Afib
20.10.2020 00:02:14 | 20.10.2020 00:19:26   | Afib

Each Row represents either one episode of Afib or one episode of noise (signal not good enough for detection). The measurement was continuous, but only arrhythmic events were documented.

We want to compare the second method to the first method to see if it would be a viable alternative to detect heart arrhythmia in patients. Hence we want to find:

true positives: Episodes which were detected in the goldstandard (ECG) and PPG (row 2 in the example above)
false positives: Episodes that were only detected using the PPG method. (row 5 in the example above)

and so forth...

Up until now I have changed the format of the timestamps, so that R will know that it is time and not just text, with the line:

ppg$Start<-dmy_hms(ppg$Start, tz=Sys.timezone())
ppg$End<-dmy_hms(ppg$End, tz=Sys.timezone())

leading to:

2020-10-19** 22:49:53 | 2020-10-19** 22:59:53 | Afib

The condition for a true positive is if an ECG episode overlaps with a PPG episode for 30 seconds.

How would I go and implement this to count true and false positives in R?

Thank you for your help.

You can do an overlap join as described e.g. here: [Overlap join with start and end positions](https://stackoverflow.com/questions/24480031/overlap-join-with-start-and-end-positions) and **Linked** therein . For the true positives, create a set of new start and end columns which includes the 30 s buffer. — Henrik, Dec 30 '21 at 11:04
Thank you very much. I have tried the described method and so far the concept seemed to be working. — Dorothy, Jan 03 '22 at 12:29
Thank you for your feedback. Glad to hear that the post was helpful. The `data.table` non-equi join is really an extremely useful feature. Good luck! — Henrik, Jan 03 '22 at 12:33

score 1 · Answer 1 · answered Dec 30 '21 at 17:46

The following function is probably too complicated but I think it does what the question asks for.
Its input arguments are

X a ECG data.frame
Y a PPG data.frame
duration Minimum duration
startcol name of the start datet imes column
endcol name of the end date times column
noisecol which column has the type, if it's "noise" count this row out
noiseval a vector of values not to be considered.

And the output is a list with members TP and FP.

overlapDuration <- function(X, Y, duration = 30, startcol, endcol, noisecol, noiseval){
  overlap_length <- function(x, y){
    if(int_overlaps(x, y)){
      xstart <- int_start(x)
      xend <- int_end(x)
      ystart <- int_start(y)
      yend <- int_end(y)
      start <- max(xstart, ystart)
      end <- min(xend, yend)
      int <- interval(start, end)
      int_length(int)
    } else NA
  }
  xname <- deparse(substitute(X))
  yname <- deparse(substitute(Y))
  Xi <- interval(X[[startcol]], X[[endcol]])
  Yi <- interval(Y[[startcol]], Y[[endcol]])
  overl <- sapply(Yi, \(x){
    sapply(Xi, overlap_length, x)
  })
  i <- which(X[[noisecol]] %in% noiseval)
  j <- which(Y[[noisecol]] %in% noiseval)
  overl[i, j] <- NA
  w <- which(!is.na(overl) & overl >= duration, arr.ind = TRUE)
  colnames(w) <- c(xname, yname)
  TP <- cbind(w, secs = overl[w])
  FP <- which(!(rownames(Y) %in% w[, yname] | Y[[noisecol]] %in% noiseval))
  list(TP = TP, FP = FP)
}

minduration <- 30
start <- "start"
end <- "end"
typecol <- "type"
noise <- "noise"
overlapDuration(ECG, PPG, minduration, start, end, typecol, noise)
#$TP
#     ECG PPG secs
#[1,]   2   2  600
#[2,]   3   3  632
#[3,]   4   4  444
#
#$FP
#[1] 5

Thank you so much for this detailed work and the effort you put into it. I really appreciate your help. I will try it out during the following days when my data set is cleaned. I do not fully understand the code right now. But I will get into it, because it looks like it does exactly what I need. Thanks again. — Dorothy, Jan 03 '22 at 12:35

Is there an R function to compare timestamps in two different datasets?

1 Answers1