0

I have a data.table in R, within this data.table are some date ranges (from and to date) and the data can be grouped together by an id. For each id, I would like to combine overlapping date ranges to end up with unique date ranges for each id which do not overlap

e.g. this is a very simplified version of my data

input <- data.table(
  id = c(rep(1, 4), rep(2, 3), rep(3, 2)),
  from_date = c('2000-01-01', '2001-01-01', '2002-01-01', '2003-01-01',
                 '2000-01-01', '2001-01-01', '2002-01-01',
                 '2000-01-01', '2001-01-01'),
  to_date = c('2000-06-01', '2003-06-01', '2002-06-01', '2003-10-01',
              '2001-05-01', '2001-02-01', '2002-06-01',
              '2000-06-01', '2002-06-01')
)
> input
   id  from_date    to_date
1:  1 2000-01-01 2000-06-01
2:  1 2001-01-01 2003-06-01
3:  1 2002-01-01 2002-06-01
4:  1 2003-01-01 2003-10-01
5:  2 2000-01-01 2001-05-01
6:  2 2001-01-01 2001-02-01
7:  2 2002-01-01 2002-06-01
8:  3 2000-01-01 2000-06-01
9:  3 2001-01-01 2002-06-01

For id 1, the second, third and fouth date ranges overlap, so I would need to combine them all into one with the from and to date reflecting the min/max date in the range. For id 2 the second and third date range overlap (indeed, the second date range contains the third date range entirely) so again these would need combining. For the third id they do not overlap so can remain the same

So for the above, i would want to end up with

> output
   id  from_date    to_date
1:  1 2000-01-01 2000-06-01
2:  1 2001-01-01 2003-10-01
3:  2 2000-01-01 2001-05-01
4:  2 2002-01-01 2002-06-01
5:  3 2000-01-01 2000-06-01
6:  3 2001-01-01 2002-06-01

In reality by data.table is a lot bigger (1,000,000s rows with 100,000s groups) and contains other data along with the id and date ranges, and so it is important that any solution is reasonably performant and fast, ideally using data.table rather than dplyr for example, as the rest of the code uses data.tables.

Thanks in advance

user1165199
  • 6,351
  • 13
  • 44
  • 60
  • What happens to the other (not date or id) columns when you merge the rows by date? – jblood94 May 17 '23 at 16:12
  • I added an answer to https://stackoverflow.com/questions/28938147/how-to-flatten-merge-overlapping-time-periods that addresses performance. – jblood94 May 18 '23 at 13:00

0 Answers0