7

Considering the follwing:

CREATE TABLE Members (MemberID INT)
INSERT Members VALUES (1001)

CREATE TABLE PCPs (PCPID INT)
INSERT PCPs VALUES (231)
INSERT PCPs VALUES (327)
INSERT PCPs VALUES (390)

CREATE TABLE Plans (PlanID INT)
INSERT Plans VALUES (555)
INSERT Plans VALUES (762)

CREATE TABLE MemberPCP (
    MemberID INT
    , PCP INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPCP VALUES (1001, 231, '2002-01-01', '2002-06-30')
INSERT MemberPCP VALUES (1001, 327, '2002-07-01', '2003-05-31')
INSERT MemberPCP VALUES (1001, 390, '2003-06-01', '2003-12-31')

CREATE TABLE MemberPlans (
    MemberID INT
    , PlanID INT
    , StartDate DATETIME
    , EndDate DATETIME)
INSERT MemberPlans VALUES (1001, 555, '2002-01-01', '2003-03-31')
INSERT MemberPlans VALUES (1001, 762, '2003-04-01', '2003-12-31')

I'm looking for a clean way to construct a timeline for Member/PCP/Plan relationships, where a change in either the PCP or plan for a member would result in a separate start/end row in the result. For example, if over a few years, a member changed their PCP twice and their plan once, but each on different dates, I would see something like the following:

MemberID  PCP  PlanID  StartDate    EndDate
1001      231  555     2002-01-01   2002-06-30
1001      327  555     2002-07-01   2003-03-31
1001      327  762     2003-04-01   2003-05-31
1001      390  762     2003-06-01   2003-12-31

As you can see, I need a separate result row for each date period that involves a difference in the Member/PCP/Plan association. I have a solution in place, but it is very convoluted with a lot of CASE statements and conditional logic in the WHERE clause. I'm just thinking there is a much simpler way to do this.

Thanks.

Rich.Carpenter
  • 1,056
  • 1
  • 9
  • 21
  • Can you post this complicated CASE statement to [SQLFiddle](http://sqlfiddle.com/) so we can see what you've done? – N West Jun 14 '12 at 20:22
  • This is a really complicated thing to do. I don't know if there is a *much simpler way* to do it. So you should probably post your solution and we can help you starting there – Lamak Jun 14 '12 at 20:26
  • 1
    What happens if you have something non-contiguous? Say, what happens if `PCP = 327` starts on `'2002-07-01'` and ends on `'2003-02-28'`? Should a null result be represented in that column? – Clockwork-Muse Jun 14 '12 at 21:34

4 Answers4

2

Compatible with T-SQL. I agree with Glenn on the general approach.

Another suggestion: If you allow hops between periods in your business, this code will need further tweak. Otherwise, I think deferring the EndDate value from next record's StartDate will be better for having more controlled behavior from your code. In that case, you want to ensure the rule before the data get to this query.

Edit: just learned about With statement and SQL Fiddle from Andriy M's post. You can see my answer at SQL Fiddle too.

Edit: Fixed the bug pointed out by Andriy.

WITH StartDates AS (
SELECT MemberId, StartDate FROM MemberPCP UNION
SELECT MemberId, StartDate FROM MemberPlans UNION
SELECT MemberId, EndDate + 1 FROM MemberPCP UNION
SELECT MemberId, EndDate + 1 FROM MemberPlans
),
EndDates AS (
SELECT MemberId, EndDate = StartDate - 1 FROM MemberPCP UNION
SELECT MemberId, StartDate - 1 FROM MemberPlans UNION
SELECT MemberId, EndDate FROM MemberPCP UNION
SELECT MemberId, EndDate FROM MemberPlans
),
Periods AS (
SELECT s.MemberId, s.StartDate, EndDate = min(e.EndDate)
  FROM StartDates s
       INNER JOIN EndDates e
           ON s.StartDate <= e.EndDate
          AND s.MemberId = e.MemberId
 GROUP BY s.MemberId, s.StartDate
)
SELECT MemberId = p.MemberId,
       pcp.PCP, pl.PlanId,
       p.StartDate, p.EndDate
  FROM Periods p
       LEFT JOIN MemberPCP pcp
           -- because of the way we divided period,
           -- there will be one and only one record that fits this join clause
           ON p.StartDate >= pcp.StartDate
          AND p.EndDate <= pcp.EndDate
          AND p.MemberId = pcp.MemberId
       LEFT JOIN MemberPlans pl
           ON p.StartDate >= pl.StartDate
          AND p.EndDate <= pl.EndDate
          AND p.MemberId = pl.MemberId
 ORDER BY p.MemberId, p.StartDate
kennethc
  • 814
  • 1
  • 10
  • 26
  • Doesn't seem to work correctly when the two history tables do not cover the same range of dates. But that may not be required, and otherwise this seems to work fine and is probably more efficient than expanding the ranges and then collapsing them back like in my answer. – Andriy M Jul 15 '12 at 19:08
  • Andriy, I see there was a bug and now corrected. The start date should participate in end date group and vice versa. Otherwise, like you said, the edge period will not be properly detected since there is no corresponding end date (or start date). I changed my SQL Fiddle example to demonstrate that case. – kennethc Jul 19 '12 at 05:16
1

My approach is to take the unique combination of start dates for each member as the starting point and then build out the other pieces of the query from there:

--
-- Traverse down a list of 
-- unique Member ID and StartDates
-- 
-- For each row find the most 
-- recent PCP for that member 
-- which started on or before
-- the start date of the current
-- row in the traversal
--
-- For each row find the most 
-- recent PlanID for that member
-- which started on or before
-- the start date of the current
-- row in the traversal
-- 
-- For each row find the earliest
-- end date for that member
-- (from a collection of unique
-- member end dates) that happened
-- after the start date of the
-- current row in the traversal
-- 
SELECT MemberID,
  (SELECT TOP 1 PCP 
   FROM MemberPCP 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PCP,
  (SELECT TOP 1 PlanID 
   FROM MemberPlans 
   WHERE MemberID = s.MemberID 
   AND StartDate <= s.StartDate 
   ORDER BY StartDate DESC
  ) AS PlanID,
  StartDate,  
  (SELECT TOP 1 EndDate 
   FROM (
    SELECT MemberID, EndDate 
    FROM MemberPlans 
    UNION 
    SELECT MemberID, EndDate 
    FROM MemberPCP) e
   WHERE EndDate >= s.StartDate 
   ORDER BY EndDate
  ) AS EndDate
FROM ( 
  SELECT
    MemberID,
    StartDate
  FROM MemberPlans
  UNION 
  SELECT
    MemberID,
    Startdate
  FROM MemberPCP
) s
ORDER BY StartDate
8kb
  • 10,956
  • 7
  • 38
  • 50
  • Thanks everyone. All suggestions are great. I tagged this one as the answer, as it does allow for gaps in Plan/PCP activity ranges. – Rich.Carpenter Jul 31 '12 at 19:02
1

As perhaps not the most efficient but at least simple and straightforward solution, I would do the following:

  • 1) expand the ranges;

  • 2) join the expanded ranges;

  • 3) group the results.

This, of course, assumes that only dates are used (i.e. the time part is 00:00 for every StartDate and EndDate in both tables).

To expand date ranges, I prefer using a numbers table, like this:

SELECT
  m.MemberID,
  m.PCP,
  Date = DATEADD(DAY, n.Number, m.StartDate)
FROM MemberPCP m
  INNER JOIN Numbers n
    ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)

And similarly for MemberPlans.

To produce a combined row set, I would use FULL JOIN, although if you know beforehand that both tables cover exactly the same period of time, INNER JOIN would do just as well:

SELECT *
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date

Now you only need to group the resulting rows and find minimum and maximum dates for every combination of (MemberID, PCP, PlanID):

SELECT
  MemberID  = ISNULL(pcp.MemberID, plans.MemberID),,
  pcp.PCP,
  plans.PlanID,
  StartDate = MIN(ISNULL(pcp.Date, plans.Date)),
  EndDate   = MAX(ISNULL(pcp.Date, plans.Date))
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date
GROUP BY
  ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID

Note that if you use INNER JOIN instead of FULL JOIN, you won't need all those ISNULL() expressions, it would be enough to pick either table's column, e.g. pcp.MemberID instead of ISNULL(pcp.MemberID, plans.MemberID) and pcp.Date instead of ISNULL(pcp.Date, plans.Date).

The complete query might look like this then:

WITH MemberPCPExpanded AS (
  SELECT
    m.MemberID,
    m.PCP,
    Date = DATEADD(DAY, n.Number, m.StartDate)
  FROM MemberPCP m
    INNER JOIN Numbers n
      ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)
),
MemberPlansExpanded AS (
  SELECT
    m.MemberID,
    m.PlanID,
    Date = DATEADD(DAY, n.Number, m.StartDate)
  FROM MemberPlans m
    INNER JOIN Numbers n
      ON n.Number BETWEEN 0 AND DATEDIFF(DAY, m.StartDate, m.EndDate)
)
SELECT
  MemberID  = ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID,
  StartDate = MIN(ISNULL(pcp.Date, plans.Date)),
  EndDate   = MAX(ISNULL(pcp.Date, plans.Date))
FROM MemberPCPExpanded pcp
  FULL JOIN MemberPlansExpanded plans
    ON pcp.MemberID = plans.MemberID AND pcp.Date = plans.Date
GROUP BY
  ISNULL(pcp.MemberID, plans.MemberID),
  pcp.PCP,
  plans.PlanID
ORDER BY
  MemberID,
  StartDate

You can try this query at SQL Fiddle.

Community
  • 1
  • 1
Andriy M
  • 76,112
  • 17
  • 94
  • 154
0

Maybe this will give some ideas for a start:

SELECT y.memberid, y.pcp, z.planid, x.startdate, x.enddate
  FROM (
        WITH startdates AS (

            SELECT startdate FROM memberpcp
            UNION
            SELECT startdate FROM memberplans
            UNION
            SELECT enddate + 1 FROM memberpcp
            UNION
            SELECT enddate + 1 FROM memberplans

            ), enddates AS (
            SELECT enddate FROM memberpcp
            UNION
            SELECT enddate FROM memberplans

          )

        SELECT s.startdate, e.enddate
          FROM startdates s 
              ,enddates e
          WHERE e.enddate = (SELECT MIN(enddate)
                               FROM enddates
                               WHERE enddate > s.startdate)
       ) x
       ,memberpcp y
       ,memberplans z

  WHERE (y.startdate, y.enddate) = (SELECT startdate, enddate FROM memberpcp WHERE startdate <= x.startdate AND enddate >= x.enddate)
    AND (z.startdate, z.enddate) = (SELECT startdate, enddate FROM memberplans WHERE startdate <= x.startdate AND enddate >= x.enddate)

I ran on Oracle with these results:

1001    231 555 01-JAN-02   30-JUN-02
1001    327 555 01-JUL-02   31-MAR-03
1001    327 762 01-APR-03   31-MAY-03
1001    390 762 01-JUN-03   31-DEC-03

The idea was to first define the different date ranges. That's in the "WITH" clause. Then do a lookup on each range in the other tables. A lot of assumptions here regarding overlapping ranges, etc. But perhaps a start. I tried looking at this without analytic functions since there might not be good support for analytic functions with tsql? I don't know. When building the date ranges for real, the ranges need to be built by memberid as well.

Glenn
  • 8,932
  • 2
  • 41
  • 54