I have a dataset of hospital claims. Each row is a claim, and I have the following columns: patient id, start date, and end date. Patients can have multiple claims if they visited the hospital multiple times. I'm trying to calculate the total time the patient spent in the hospital based on all the claims in the dataset.
library(tibble)
df <- tribble(
~id, ~start_date, ~end_date,
"100003186", "2011-06-18", "2011-08-09",
"100003186", "2011-06-18", "2011-08-23",
"100003186", "2011-12-14", "2011-12-16",
"100003186", "2014-09-14", "2014-09-17",
"100003186", "2014-09-10", "2014-09-18",
"100003187", "2011-11-18", "2011-11-30",
"100003187", "2011-11-18", "2011-11-23",
)
The problem is that some claims overlap in their dates. For example, for id=="100003186", the first claim is from date 2011-06-18 to 2011-08-09, but this time period is already contained in the second claim, from date 2011-06-18 to 2011-08-23.
How can I delete the rows where the time interval is contained within the interval of another claim for the same individual (id)?
This question offers a possible solution, but I'd like to implement it by id: R: Determine if each date interval overlaps with all other date intervals in a dataframe