1

I have a dataset of hospital claims. Each row is a claim, and I have the following columns: patient id, start date, and end date. Patients can have multiple claims if they visited the hospital multiple times. I'm trying to calculate the total time the patient spent in the hospital based on all the claims in the dataset.

library(tibble)
df <- tribble(
  ~id, ~start_date, ~end_date,
  "100003186", "2011-06-18", "2011-08-09",
  "100003186", "2011-06-18", "2011-08-23",
  "100003186", "2011-12-14", "2011-12-16",
  "100003186", "2014-09-14", "2014-09-17",
  "100003186", "2014-09-10", "2014-09-18",
  "100003187", "2011-11-18", "2011-11-30",
  "100003187", "2011-11-18", "2011-11-23",
)

The problem is that some claims overlap in their dates. For example, for id=="100003186", the first claim is from date 2011-06-18 to 2011-08-09, but this time period is already contained in the second claim, from date 2011-06-18 to 2011-08-23.

How can I delete the rows where the time interval is contained within the interval of another claim for the same individual (id)?

This question offers a possible solution, but I'd like to implement it by id: R: Determine if each date interval overlaps with all other date intervals in a dataframe

dyrland
  • 608
  • 1
  • 7
  • 17
9Lulis9
  • 11
  • 1
  • Edit: I do not want to delete all overlaps. I only want to delete the interval that is contained within another interval. – 9Lulis9 Apr 27 '21 at 21:51
  • Does this answer your question? [How to create a dummy variable in R for dates that lie between a certain interval?](https://stackoverflow.com/questions/67275362/how-to-create-a-dummy-variable-in-r-for-dates-that-lie-between-a-certain-interva) – dash2 Apr 28 '21 at 11:11

3 Answers3

0

You can use %within% and group_by()!

library(tidyverse)
library(purrr)
library(lubridate)
df %>%
  group_by(id) %>%
  mutate(Int = interval(start_date, end_date), 
                within = map(seq_along(Int), function(x){
           y = setdiff(seq_along(Int), x)
           #The interval is within any other intervals (in the group)
           return(any(Int[x] %within% Int[y]))
           
         })
  ) %>% 
  #and now remove those that are within another
  filter(within == FALSE)

Check out the documentation on lubridate, it's full of neat little functions!

dyrland
  • 608
  • 1
  • 7
  • 17
  • 1
    Hi @dyrland, thanks for the answer! However, when running it in the code above, I get the following error. Do you know why this si? Error: Problem with `mutate()` input `overlaps`. x error in evaluating the argument 'b' in selecting a method for function '%within%': object 'y' not found. – 9Lulis9 Apr 27 '21 at 21:28
  • Edit: I also do not want to delete all overlaps. I only want to delete the interval that is contained within another interval. – 9Lulis9 Apr 27 '21 at 21:50
  • Whoops! In clearing out code from the linked answer I cleared out the line that generated `y`. I have edited the code. I added a filter at the end that drops rows with intervals that are completely within another interval. I suppose you could convert that interval to null if you wanted to... – dyrland Apr 28 '21 at 01:13
0

Sort by start date, then look for any that have an end date less than the previous one.

library(dplyr)
df %>% 
      arrange(id, start_date) %>% 
      group_by(id) %>% 
      mutate(contained = end_date <= lag(end_date)) %>%
      filter(!contained | is.na(contained))

This is "weak containment" i.e. it may delete some that have the same start date and/or end date. If you don't want that, adjust the within calculation as appropriate. The is.na call in the last line ensures we don't delete first rows per ID.

dash2
  • 2,024
  • 6
  • 15
0

Although this is an older question, here's a newer option using the IV package dedicated to working with intervals:

library(tidyverse)
library(ivs)

df <- tribble(
  ~id, ~start_date, ~end_date,
  "100003186", "2011-06-18", "2011-08-09",
  "100003186", "2011-06-18", "2011-08-23",
  "100003186", "2011-12-14", "2011-12-16",
  "100003186", "2014-09-14", "2014-09-17",
  "100003186", "2014-09-10", "2014-09-18",
  "100003187", "2011-11-18", "2011-11-30",
  "100003187", "2011-11-18", "2011-11-23",
)

df |> 
  mutate(iv = iv(start_date, end_date)) |> 
  group_by(id) |> 
  summarise(iv = iv_groups(iv), .groups = "drop")
#> # A tibble: 4 × 2
#>   id                              iv
#>   <chr>                    <iv<chr>>
#> 1 100003186 [2011-06-18, 2011-08-23)
#> 2 100003186 [2011-12-14, 2011-12-16)
#> 3 100003186 [2014-09-10, 2014-09-18)
#> 4 100003187 [2011-11-18, 2011-11-30)

Created on 2022-05-27 by the reprex package (v2.0.1)

Carl
  • 4,232
  • 2
  • 12
  • 24