1

This is slightly different than the usual gaps-and-islands SQL problem. I have a table that corresponds to the date each person received a batch of medication, and the number of days supply of that medication. This means each row could be viewed as a date range for which the person had medication. A person may receive multiple batches of medication on the same day, and may also receive a new batch of medication when they still have some remaining. I need to find the start and end dates for contiguous date ranges when the person had medication. For example, imagine the input was the table below (which only includes a single person with a single kind of medication):

CREATE TABLE Supply (
  start_dt DATE NOT NULL,
  days_supply INT NOT NULL;
);

INSERT INTO Supply VALUES
  ('2022-01-01', 7),
  ('2022-01-01', 7),
  ('2022-01-12', 7),
  ('2022-01-28', 7),
  ('2022-02-04', 2),
  ('2022-03-01', 7);

The desired output would look like this:

start_dt end_dt
2022-01-01 2022-01-21
2022-01-28 2022-02-05
2022-03-01 2022-03-07

I've looked at the usual gaps-and-islands solutions to resolving overlaps between date ranges, but in those, the amount of overlap doesn't matter. In this problem, any overlap effectively extends the end date of the contiguous medication date range.

  • Please tag your dbms. Anything with dates is very specific. Also, it seems like you're missing an id column for the person that gets the medication or is the entire thing for one preson? – DannySlor Apr 26 '23 at 16:31
  • It's a toy example that only includes one person and one kind of medication. I'm keeping it as simple as possible. – Whiskeyjack Apr 26 '23 at 16:34
  • This requires a recursive query, which I am not sure that Apache Spark SQL supports... – GMB Apr 26 '23 at 16:50
  • Does this answer your question? [recursive cte in spark SQL](https://stackoverflow.com/questions/52562607/recursive-cte-in-spark-sql) – nbk Apr 26 '23 at 17:04

0 Answers0