In R: is there a way to flag overlapping date ranges within each specific group in a table? (i.e. by patient ID)

Question

I have a set of hospital admission and discharge dates, broken down by patient ID. There are multiple date ranges per ID and some of them overlap. I am trying to find a way to flag which rows contain overlapping dates, so that when I am calculating 'length of hospital stay' I do not double-count.

So far, I have created an interval variable (discharge date - admission date), and used int_overlaps to flag rows where there are overlaps. This has worked okay, but as well as flagging overlaps, it also flags consecutive stays.

i.e. I want to flag:

Stay A: 2001-10-03 / 2001-10-06

Stay B: 2001-10-04 / 2001-10-11

But I don't want to flag:

Stay A: 2001-10-03 / 2001-10-06

Stay B: 2001-10-06 / 2001-10-11

The code I used was copied from an answer elsewhere on this site, and I don't understand it enough to modify it in the right way (I am an almost total novice at R...!)

This is a simplified example of the df and code....if anyone can advise how I could change it to stop flagging the consecutive stays, I would super appreciate it!!!

ID <- c(1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 5)
admdate <- c("2001-10-03", "2001-10-05", "2003-10-04", "2006-02-03", "2006-05-27", "2006-07-01", "2001-08-02", "2008-10-11", "2008-11-01", "2009-01-09", "2009-02-18")
dischdate <- c("2001-10-05", "2001-12-08", "2003-10-04", "2006-05-29", "2006-06-01", "2006-07-07", "2001-08-11", "2008-10-14", "2009-01-13", "2009-01-21", "2009-02-26")

HospAdms <- cbind(ID, admdate, dischdate)
HospAdms <- data.frame(ID, admdate, dischdate)

as_date(HospAdms$admdate)
as_date(HospAdms$dischdate)

HospAdms$Int <- interval(start=HospAdms$admdate, end=HospAdms$dischdate)

HospAdms$overlap <- unlist(tapply(HospAdms$Int,
                                 HospAdms$ID,
                                 function(x) rowSums(outer(x,x,int_overlaps))>1))

In the df that this example code produces, the top two lines are consecutive stays but they are flagged and I don't want them to be. Hope that makes sense!

might be relevant: https://stackoverflow.com/questions/28938147/how-to-flatten-merge-overlapping-time-periods/28938694#28938694 — chinsoon12, Jun 07 '20 at 11:59

Waldi · Accepted Answer · 2020-06-07T14:32:03.130

1

does this answer your question ?

library(data.table)
admissions <- data.table(
  ID = c(1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 5),
  admdate = c("2001-10-03", "2001-10-05", "2003-10-04", "2006-02-03", "2006-05-27", "2006-07-01", "2001-08-02", "2008-10-11", "2008-11-01", "2009-01-09", "2009-02-18"),
  dischdate = c("2001-10-05", "2001-12-08", "2003-10-04", "2006-05-29", "2006-06-01", "2006-07-07", "2001-08-11", "2008-10-14", "2009-01-13", "2009-01-21", "2009-02-26")
  )

# Non equi joins are only possible with numeric fields
admissions[,c('start','end'):=.(as.POSIXct(admdate),
                                as.POSIXct(dischdate))]

admissions[admissions, on = .(ID=ID,start<start,end>start ),nomatch = NULL]

edited Jun 07 '20 at 14:32

answered Jun 06 '20 at 18:10

Waldi

39,242
6
30
78

Thanks for this - it works perfectly on the sample data set. However, when I try it on my actual data set, it produces this error: Join results in 22361 rows; more than 8188 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice. – space_canada Jun 07 '20 at 14:25
I tried re-running it with allow.cartesian = TRUE and it flagged an overlap for every single line. Any ideas would be much appreciated?! – space_canada Jun 07 '20 at 14:27
Sorry, I forgot to put ID in the non-equi join! Just edited it : can you try again? – Waldi Jun 07 '20 at 14:31
It works!! Amazing, thank you so much. I have spent SO LONG trying to fix this, I'm very grateful, haha! – space_canada Jun 07 '20 at 14:45

Ben · Answer 2 · 2020-06-06T18:22:52.740

This data.table approach will give you total length of stay for each ID, accounting for gaps and overlap.

With seq.Date you create a sequence of individual dates from admission to discharge, grouped by ID.

uniqueN will give you unique dates across the sequence dates to avoid double counting.

library(data.table)

setDT(HospAdms)[, .(dates = seq.Date(admdate, dischdate, 'day')) , by = .(ID, 1:nrow(HospAdms))
  ][, .(LOS = uniqueN(dates)), by = ID][]

Output

score 0 · Answer 3 · answered Jun 06 '20 at 19:42

I believe this is what you are looking for. As you can see, periods woith a start date on a enddate are not joined together, while 'real' overlapping periods are joined (by ID)

input

   ID    admdate  dischdate
1   1 2001-10-03 2001-10-05
2   1 2001-10-05 2001-12-08
3   2 2003-10-04 2003-10-04
4   3 2006-02-03 2006-05-29
5   3 2006-05-27 2006-06-01
6   3 2006-07-01 2006-07-07
7   4 2001-08-02 2001-08-11
8   5 2008-10-11 2008-10-14
9   5 2008-11-01 2009-01-13
10  5 2009-01-09 2009-01-21
11  5 2009-02-18 2009-02-26

output

   ID    admdate  dischdate
1:  1 2001-10-03 2001-10-05
2:  1 2001-10-05 2001-12-08
3:  3 2006-02-03 2006-06-01
4:  3 2006-07-01 2006-07-07
5:  4 2001-08-02 2001-08-11
6:  5 2008-10-11 2008-10-14
7:  5 2008-11-01 2009-01-21
8:  5 2009-02-18 2009-02-26

code

library( data.table )
library( intervals )
#make it a data.table
setDT(HospAdms)
#add time to dates, and since intervals-package can only 
#  handle numeric intervals, convert the date to numeric
HospAdms[, admdate := as.numeric(as.Date(admdate)) ]
HospAdms[, dischdate := as.numeric(as.Date(dischdate)) ]
#custom function for createing intervals
myfun <- function( y ) {
  data.table::as.data.table( 
    intervals::interval_union(
      intervals::Intervals( as.matrix( y ), 
                            closed = c(FALSE, FALSE) ),  # <--!! the reason why 
                                                         #       matching start-end 
                                                         #       intervals are 
                                                         #       NOT joined
      check_valid = TRUE ),
  )
}

# crate the final output
answer <- HospAdms[, myfun( .SD ), by = .(ID)]
#set numeric intervals back to dates
cols <- c("V1","V2")
answer[, (cols) := lapply( .SD, as.Date, origin = "1970-01-01" ), .SDcols = cols ]
#set colnumn names
setnames(answer, cols, c("admdate", "dischdate") )
#tadaaah
answer

sample data

Thanks for this. However, I'm not looking to join the date ranges (as they can represent info I need, such as transfers to other wards), just to add another column to flag where they overlap. If you can help, I would really appreciate it! — space_canada, Jun 07 '20 at 14:29

In R: is there a way to flag overlapping date ranges within each specific group in a table? (i.e. by patient ID)

3 Answers3