2

I am new to R and have just started my research work, so please excuse if the answer is obvious. I have tried to find the answer in other questions, but I am not sure if I am using the right terms. Including this similar, but not identical question (R Stats: Comparing timestamps in two dataframes).

For my research question we wanted to measure episodes of heart arrhythmia (atrial fibrillation=afib) in patients. We did this using two different methods: ECG and PPG.

Therefore we have two different dataframes per patient.

ECG:

start               | end                   | type
19.10.2020 11:34:53 | 19.10.2020 11:35:24   | noise   
19.10.2020 22:49:53 | 19.10.2020 22:59:53   | Afib
19.10.2020 23:00:21 | 19.10.2020 23:10:53   | Afib
19.10.2020 23:47:14 | 19.10.2020 23:56:22   | Afib

PPG:

start               | end                   | type
19.10.2020 11:25:53 | 19.10.2020 11:40:24   | noise   
19.10.2020 22:49:53 | 19.10.2020 22:59:53   | Afib
19.10.2020 23:00:21 | 19.10.2020 23:15:53   | Afib
19.10.2020 23:42:04 | 19.10.2020 23:54:38   | Afib
20.10.2020 00:02:14 | 20.10.2020 00:19:26   | Afib

Each Row represents either one episode of Afib or one episode of noise (signal not good enough for detection). The measurement was continuous, but only arrhythmic events were documented.

We want to compare the second method to the first method to see if it would be a viable alternative to detect heart arrhythmia in patients. Hence we want to find:

  • true positives: Episodes which were detected in the goldstandard (ECG) and PPG (row 2 in the example above)

  • false positives: Episodes that were only detected using the PPG method. (row 5 in the example above)

and so forth...

Up until now I have changed the format of the timestamps, so that R will know that it is time and not just text, with the line:

ppg$Start<-dmy_hms(ppg$Start, tz=Sys.timezone())
ppg$End<-dmy_hms(ppg$End, tz=Sys.timezone())

leading to:

2020-10-19** 22:49:53 | 2020-10-19** 22:59:53 | Afib

The condition for a true positive is if an ECG episode overlaps with a PPG episode for 30 seconds.

How would I go and implement this to count true and false positives in R?

Thank you for your help.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Dorothy
  • 21
  • 2
  • You can do an overlap join as described e.g. here: [Overlap join with start and end positions](https://stackoverflow.com/questions/24480031/overlap-join-with-start-and-end-positions) and **Linked** therein . For the true positives, create a set of new start and end columns which includes the 30 s buffer. – Henrik Dec 30 '21 at 11:04
  • Thank you very much. I have tried the described method and so far the concept seemed to be working. – Dorothy Jan 03 '22 at 12:29
  • Thank you for your feedback. Glad to hear that the post was helpful. The `data.table` non-equi join is really an extremely useful feature. Good luck! – Henrik Jan 03 '22 at 12:33

1 Answers1

1

The following function is probably too complicated but I think it does what the question asks for.
Its input arguments are

  • X a ECG data.frame
  • Y a PPG data.frame
  • duration Minimum duration
  • startcol name of the start datet imes column
  • endcol name of the end date times column
  • noisecol which column has the type, if it's "noise" count this row out
  • noiseval a vector of values not to be considered.

And the output is a list with members TP and FP.

overlapDuration <- function(X, Y, duration = 30, startcol, endcol, noisecol, noiseval){
  overlap_length <- function(x, y){
    if(int_overlaps(x, y)){
      xstart <- int_start(x)
      xend <- int_end(x)
      ystart <- int_start(y)
      yend <- int_end(y)
      start <- max(xstart, ystart)
      end <- min(xend, yend)
      int <- interval(start, end)
      int_length(int)
    } else NA
  }
  xname <- deparse(substitute(X))
  yname <- deparse(substitute(Y))
  Xi <- interval(X[[startcol]], X[[endcol]])
  Yi <- interval(Y[[startcol]], Y[[endcol]])
  overl <- sapply(Yi, \(x){
    sapply(Xi, overlap_length, x)
  })
  i <- which(X[[noisecol]] %in% noiseval)
  j <- which(Y[[noisecol]] %in% noiseval)
  overl[i, j] <- NA
  w <- which(!is.na(overl) & overl >= duration, arr.ind = TRUE)
  colnames(w) <- c(xname, yname)
  TP <- cbind(w, secs = overl[w])
  FP <- which(!(rownames(Y) %in% w[, yname] | Y[[noisecol]] %in% noiseval))
  list(TP = TP, FP = FP)
}

minduration <- 30
start <- "start"
end <- "end"
typecol <- "type"
noise <- "noise"
overlapDuration(ECG, PPG, minduration, start, end, typecol, noise)
#$TP
#     ECG PPG secs
#[1,]   2   2  600
#[2,]   3   3  632
#[3,]   4   4  444
#
#$FP
#[1] 5
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you so much for this detailed work and the effort you put into it. I really appreciate your help. I will try it out during the following days when my data set is cleaned. I do not fully understand the code right now. But I will get into it, because it looks like it does exactly what I need. Thanks again. – Dorothy Jan 03 '22 at 12:35