4

I have a table that contains a record for each ward stay within a hospital spell (note: a spell can include transfers to other hospitals). Spellno is the unique identifier of a spell. I would like to aggregate consecutive ward stays within a spell to hospital level. The problem I have is that if a patient goes from hospital1 to hospital2 and back to hospital1 a GROUP BY 'Spellno' and 'Hospital' would combine the two hospital1 stays, which I don't want to do.

e.g. if this was my data:

Spellno   Hospital   WardCode   WardStart   WardEnd 
-------------------------------------------------------------------
123       hosp1      ward1      01/04/2015  03/04/2015
123       hosp1      ward4      03/04/2015  05/04/2015
123       hosp2      ward2      05/04/2015  07/04/2015
123       hosp1      ward3      07/04/2015  10/04/2015
123       hosp1      ward1      10/04/2015  12/04/2015

I want to aggregate on Spellno and Hospital to get:

Spellno   Hospital   WardStart   WardEnd 
-------------------------------------------------------------------
123       hosp1      01/04/2015  05/04/2015
123       hosp2      05/04/2015  07/04/2015
123       hosp1      07/04/2015  12/04/2015

Many thanks in advance.

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
MarkB
  • 45
  • 4

2 Answers2

4

You can use subquery in WHERE clause to filter out overlapping dates ranges and second subquery in SELECT to get range end.

SELECT Spellno, Hospital,D.WardStart,
   (SELECT Min(E.WardEnd)
    FROM #tab E
    WHERE E.WardEnd >= D.WardEnd
      AND E.Spellno = D.Spellno
      AND E.Hospital = D.Hospital
      AND NOT EXISTS (SELECT 1
                      FROM #tab E2
                      WHERE E.WardStart < E2.WardStart
                        AND E.WardEnd >= E2.WardStart
                        AND D.Spellno = E2.Spellno
                        AND D.Hospital = E2.Hospital)
  ) AS WardEnd
FROM #tab D
WHERE NOT EXISTS (SELECT 1
                  FROM #tab D2
                  WHERE D.WardStart <= D2.WardEnd
                    AND D.WardEnd > D2.WardEnd
                    AND D.Spellno = D2.Spellno
                    AND D.Hospital = D2.Hospital)

Warning:

This query performance may not be the best but it would do the work.

LiveDemo

Output:

╔═════════╦══════════╦═════════════════════╦═════════════════════╗
║ Spellno ║ Hospital ║      WardStart      ║       WardEnd       ║
╠═════════╬══════════╬═════════════════════╬═════════════════════╣
║     123 ║ hosp1    ║ 2015-04-01 00:00:00 ║ 2015-04-05 00:00:00 ║
║     123 ║ hosp2    ║ 2015-04-05 00:00:00 ║ 2015-04-07 00:00:00 ║
║     123 ║ hosp1    ║ 2015-04-07 00:00:00 ║ 2015-04-12 00:00:00 ║
╚═════════╩══════════╩═════════════════════╩═════════════════════╝
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • This seems to be working perfectly (takes 13 seconds on my 440,000 record table). Only thing I don't understand is how the subquery in the SELECT works. – MarkB Oct 27 '15 at 13:08
  • @MarkB The subquery in WHERE will get you correct WardStart and eliminate overlapping records. The subquery in SELECT will get the last `WardEnd` to enclose the time period. You can play with it `SELECT Spellno, Hospital,D.WardStart, D.WardEnd`. Keep in mind that both queries are correlated to main query – Lukasz Szozda Oct 27 '15 at 13:16
  • I get that, I'm just trying to unpick it so I understand how it works. I'll get there. Also, just realised that this falls over when the last ward stay has a NULL end date, i.e. the patient is still in hospital. I will post the answer when I have it. – MarkB Oct 27 '15 at 13:23
  • @MarkB Now you are changing the rules of game. `NULL` in WardEnd wasn't in original question. For future add this to your question because it is very important. – Lukasz Szozda Oct 27 '15 at 13:25
  • Yes, this was an oversight on my part. Could I get round it by enclosing each WardEnd in an ISNULL e.g. ISNULL(E.WardEnd, 2099-01-01') >= E2.WardStart? – MarkB Oct 27 '15 at 13:31
  • @MarkB Yes I was thinking about adding ISNULL too. But I see you've got it already. Anyway please for future be more specific – Lukasz Szozda Oct 27 '15 at 13:32
1

I'm assuming that the (WardStart, WardEnd) date ranges are strictly consecutive with no overlapping. For simplicity's sake, I'm also assuming that consecutive ranges don't exceed the max recursion default.

This can be solved using recursive SQL:

WITH 
  data AS (
    SELECT * 
    FROM (
      VALUES (123, 'hosp1', 'ward1', CAST('2015-04-01' AS DATE), CAST('2015-04-03' AS DATE)),
             (123, 'hosp1', 'ward4', CAST('2015-04-03' AS DATE), CAST('2015-04-05' AS DATE)),
             (123, 'hosp2', 'ward2', CAST('2015-04-05' AS DATE), CAST('2015-04-07' AS DATE)),
             (123, 'hosp1', 'ward3', CAST('2015-04-07' AS DATE), CAST('2015-04-10' AS DATE)),
             (123, 'hosp1', 'ward1', CAST('2015-04-10' AS DATE), CAST('2015-04-12' AS DATE))
    ) AS t(Spellno, Hospital, WardCode, WardStart, WardEnd)
  ),
  consecutive(Spellno, Hospital, WardStart, WardEnd) AS (
    SELECT Spellno, Hospital, WardStart, WardEnd
    FROM data AS d1
    WHERE NOT EXISTS (
      SELECT *
      FROM data AS d2
      WHERE d1.Spellno = d2.Spellno
      AND d1.Hospital = d2.Hospital
      AND d1.WardStart = d2.WardEnd
    )
    UNION ALL
    SELECT c.Spellno, c.Hospital, c.WardStart, d.WardEnd
    FROM consecutive AS c
    JOIN data AS d
    ON c.Spellno = d.Spellno
    AND c.Hospital = d.Hospital
    AND c.WardEnd = d.WardStart
  )
SELECT Spellno, Hospital, WardStart, MAX(WardEnd)
FROM consecutive
GROUP BY Spellno, Hospital, WardStart
ORDER BY Spellno, WardStart

Demo

Explanation

The first subquery in the recursive CTE consecutive initialises the recursion to start with all rows for which there isn't any "previous row" for the same (Spellno, Hospital). This produces:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-03
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-10

The recursion then produces a new row with the previous row's WardStart (which is always the same for consecutive rows) and the current WardEnd. This produces:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-03 <-- Unwanted, "intermediary" row
123      hosp1     2015-04-01  2015-04-05
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-10 <-- Unwanted, "intermediary" row
123      hosp1     2015-04-07  2015-04-12

Finally, in the outer query, we select only the maximum value of WardEnd for each consecutive series, producing the wanted result:

Spellno  Hospital  WardStart   WardEnd
-----------------------------------------
123      hosp1     2015-04-01  2015-04-05
123      hosp2     2015-04-05  2015-04-07
123      hosp1     2015-04-07  2015-04-12
Community
  • 1
  • 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • This won't work see https://data.stackexchange.com/stackoverflow/query/380956/consecutive-date-ranges for hosp2 – Lukasz Szozda Oct 27 '15 at 09:35
  • Going to need to few minutes to get my head around this but many thanks for the quick response! – MarkB Oct 27 '15 at 09:36
  • @LukasEder What if I change wards to `hosp2 ward1` like here https://data.stackexchange.com/stackoverflow/query/380957/consecutive-date-ranges this is analogic to `hosp1` – Lukasz Szozda Oct 27 '15 at 09:37
  • @lad2025: I made some assumptions about the data. I think your suggested data makes no sense in the context of the requirements. A patient / spellno cannot be in two different hospitals at the same time. If that assumption is wrong, then I'm happy to read the OP's updated question. – Lukas Eder Oct 27 '15 at 09:40
  • Changed different spellno, different hospital https://data.stackexchange.com/stackoverflow/query/380958/consecutive-date-ranges, **still the same**. – Lukasz Szozda Oct 27 '15 at 09:43
  • @lad2025: Now we have a gap within the `spellno = 123` *and* overlapping date ranges for `(124, 'hosp2')` What is it you're trying to say here? :) – Lukas Eder Oct 27 '15 at 09:44
  • 1
    @LukasEder: your assumptions are correct. gaps or overlaps within a spellno would indicate corrupt data – MarkB Oct 27 '15 at 09:53
  • 1
    @lad2025... Anyway, nonetheless, I find your solution more elegant for this specific question – Lukas Eder Oct 27 '15 at 10:00