3

When there is a gap between commitments of the same type, the fact should contain multiple records that show the accurate start and end date of each continuous commitment. An example of this is patid 1001 and when there is a continuation of the same status without a gap, it should be a single record.

            CREATE TABLE #legal_data (
            ClaimID VARCHAR(20)
            ,dim_legalstat_key int--dimensionkey
            ,[order_start_date] DATE
            ,[order_end_date] DATE
            ,[days_committed]  int----days between order_start_date & order_end_date

            )

            INSERT INTO #legal_data
            VALUES
            ('1001','11','2022-05-11','2022-10-29','171')
            ,('1001','131','2022-07-15','2023-03-19','247')
            ,('1001','116','2023-03-14','2023-03-20','6')
            ,('1001','11','2023-03-20','2023-03-23','3')
            ,('1207','11','2022-09-13','2023-03-12','180')
            ,('1207','11','2023-03-10','2023-03-23','13')
            ,('1924','2','2021-12-18','2022-06-19','183')
            ,('1924','2','2022-06-19','2023-12-20','184')
            ,('1842','77','2021-02-20','2022-06-17','482')
            ,('1842','77','2022-06-18','2023-12-20','550')
            ,('1661','22','2022-02-14','2023-03-20','399')
            ,('1661','22','2022-02-14','2023-03-23','402')
            ,('1553','4','2022-01-14','2022-02-12','29')---
            ,('1553','4','2022-02-14','2023-03-23','402')


            ------desired result

            CREATE TABLE #legal_Result (
            ClaimID VARCHAR(20)
            ,dim_legalstat_key int--dimensionkey
            ,[order_start_date] DATE
            ,[order_end_date] DATE
            ,[days_committed]  int----days between order_start_date & order_end_date

            )

            INSERT INTO #legal_Result
            VALUES
            ('1001','11','2022-05-11','2022-10-29','171')
            ,('1001','131','2022-07-15','2023-03-19','247')
            ,('1001','116','2023-03-14','2023-03-20','6')
            ,('1001','11','2023-03-20','2023-03-23','3')
            ,('1207','11','2022-09-13','2023-03-23','191')
            ,('1924','2','2021-12-18','2023-12-20','732')
            ,('1842','77','2021-02-20','2023-12-20','1033')--not working
            ,('1661','22','2022-02-14','2023-03-23','402') ---
            ,('1553','4','2022-01-14','2022-02-12','29')--anything the 
            ,('1553','4','2022-02-14','2023-03-23','402')

            select * from #legal_data

            select * from #legal_Result
Drdre01
  • 39
  • 5
  • 1
    It looks like this is specifically for SQL Server (the use of #temptable). You may want to tag the question as such to help others. – dougp Mar 24 '23 at 22:36
  • Is shure this test row ``,('1207','11','2022-09-13','2023-03-23','191')`` ? – ValNik Mar 24 '23 at 22:38

2 Answers2

2

This is a gaps and islands question. You have already defined the ranges, now you need to group them, identify the islands, and perform aggregation.

Your expected output appears incorrect. ClaimID 1207 has a minimum date of 2022-03-10, not 2023-09-13. Also, there are 257 days between 2022-07-15 and 2023-03-29. Assuming those are flaws in your question, this code should work for you.

responding to comment

If your ranges are not yet fully consolidated -- a possibility ValNik points out -- you can use this CTE to do that before the Groups CTE:

ranges as (
    select ClaimID
    , dim_legalstat_key
    , MIN(order_start_date) OVER (PARTITION BY ClaimID, dim_legalstat_key ORDER BY order_start_date, order_end_date ROWS BETWEEN 0 FOLLOWING AND UNBOUNDED FOLLOWING) as order_start_date
    , MAX(order_end_date) OVER (PARTITION BY ClaimID, dim_legalstat_key ORDER BY order_start_date, order_end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) as order_end_date
    from #legal_data f
),

Then, of course, the table reference in Groups must change.

end of edit

;
WITH
Groups as (
    SELECT ClaimID
    , dim_legalstat_key
    , order_start_date
    , order_end_date
    , LAG(order_end_date,1) OVER (partition by ClaimID, dim_legalstat_key ORDER BY order_start_date, order_end_date) AS PreviousEndDate
    FROM #legal_data
),
IslandID as (
    SELECT ClaimID
    , dim_legalstat_key
    , order_start_date
    , order_end_date
    , CASE WHEN PreviousEndDate >= order_start_date THEN 0 ELSE 1 END AS IslandStartInd
    , SUM(CASE WHEN PreviousEndDate >= order_start_date THEN 0 ELSE 1 END) OVER ( ORDER BY claimid, dim_legalstat_key, order_start_date, order_end_date) AS IslandId
    FROM Groups
), 
Islands as (
  SELECT ClaimID
  , dim_legalstat_key
  , MIN(order_start_date) AS order_start_date
  , MAX(order_end_date) AS order_end_date
  FROM IslandId
  GROUP BY IslandId
  , ClaimID
  , dim_legalstat_key
)

select *
, datediff(day,order_start_date, order_end_date) as days_committed
from Islands
order by ClaimID
, order_start_date
, order_end_date
dougp
  • 2,810
  • 1
  • 8
  • 31
  • Hi @dougp This was an error, right? , LAG(order_end_date--?--(order_start_date)--?--,1) OVER (partition by ClaimID, dim_legalstat_key ORDER BY order_start_date, order_end_date) AS PreviousStartDate , LAG(order_end_date,1) OVER (partition by ClaimID, dim_legalstat_key ORDER BY order_start_date, order_end_date) AS PreviousEndDate – Drdre01 Jun 01 '23 at 19:58
  • @Drdre01 probably. Doesn't matter. That item isn't used. Deleted that line of code from the answer. – dougp Jun 01 '23 at 20:53
  • I modified the question and would love your contribution. – Drdre01 Jul 21 '23 at 19:07
  • Do you mean the data you recently added to the inputs and desired outputs? I don't understand. There is a 1-day gap between the inputs. That would produce separate rows in the output based on your stated requirements. So your desired output doesn't match your question. – dougp Jul 25 '23 at 23:17
  • yes, the 1-day gap is considered continuous, start date =previous end date is continuous and if the previous end date > start date any other condition should produce a separate row e.g added (1553) – Drdre01 Jul 26 '23 at 22:10
  • That's not what your question says, so I won't update my answer. But I think all you would need to do is change the comparison in the IslandID CTE to something like `PreviousEndDate >= dateadd(day, -1, order_start_date)` – dougp Jul 26 '23 at 22:41
  • is this were you referring I made the change? , CASE WHEN PreviousEndDate >= order_start_date THEN 0 ELSE 1 END AS IslandStartInd , SUM(CASE WHEN PreviousEndDate >= order_start_date THEN 0 ELSE 1 END) OVER ( ORDER BY claimid, dim_legalstat_key, order_start_date, order_end_date) AS IslandId FROM Groups – Drdre01 Jul 27 '23 at 17:11
1

I looked at the solution proposed by @dougp. It works well with "normally" ordered data. I think, problem is with "chaotically" ordered data such as this:

period1 ---------------------------
period2              -----
period3                     ---------------------------

For this question, for example

            ,('2925','5','2022-12-10','2022-12-20','x')
            ,('2925','5','2022-12-15','2022-12-18','x')
            ,('2925','5','2022-12-19','2022-12-29','x')

Query result for this rows

ClaimID dim_legalstat_key order_start_date order_end_date days_committed
2925 5 2022-12-10 2022-12-20 10
2925 5 2022-12-19 2022-12-29 10

Expected result

ClaimID dim_legalstat_key order_start_date order_end_date days_committed
2925 5 2022-12-10 2022-12-29 19

Maybe the data is ordered "correctly" and there will be no such case.

This task is interesting. I see, my view of the solution would be similar to @dougp's solution. Seeing a possible error on the test data, I will propose a recursive solution for consideration.

with ndata as(
select ClaimID,dim_legalstat_key,order_start_date
  ,max(order_end_date)order_end_date
  ,datediff(d,order_start_date,max(order_end_date))days_committed
  ,row_number()over(partition by ClaimId order by order_start_date) rn 
from #legal_data
group by ClaimID,dim_legalstat_key,order_start_date
)
,r as( --Islands head rows
select rn headrow,rn,1 lvl,ClaimID,dim_legalstat_key
       ,order_start_date,order_end_date,days_committed
       ,cast(rn as varchar(1000)) rowlist
from ndata t1
where not exists
    (
     select * from ndata t2 
     where t2.ClaimId=t1.ClaimId and t2.rn<>t1.rn
       and t2.dim_legalstat_key=t1.dim_legalstat_key
       and t2.order_start_date<=t1.order_start_date 
       and t2.order_end_date>=t1.order_start_date 
     )

union all  --iterate through all possible rows

select r.headrow,t2.rn,r.lvl+1 lvl,r.ClaimID,t2.dim_legalstat_key
       ,case when r.order_start_date<t2.order_start_date then r.order_start_date
        else t2.order_start_date end order_start_date
       ,case when r.order_end_date>t2.order_end_date then r.order_end_date
        else t2.order_end_date end order_end_date
       ,r.days_committed
       ,cast(concat(r.rowlist,',',cast(t2.rn as varchar)) as varchar(1000))rowlist
from r inner join ndata t2 
  on  t2.ClaimId=r.ClaimId  and t2.dim_legalstat_key=r.dim_legalstat_key
    and r.rn<>t2.rn
    and charindex(','+cast(t2.rn as varchar)+',',','+r.rowlist+',')=0
    and t2.order_start_date>=r.order_start_date 
    and t2.order_start_date<=r.order_end_date 
 and lvl<100  
)
select ClaimID,dim_legalstat_key
  ,min(order_start_date) order_start_date
  ,max(order_end_date)order_end_date
  ,datediff(d,min(order_start_date),max(order_end_date)) days_committed
from r
group by ClaimID,dim_legalstat_key,headrow
order by ClaimID,dim_legalstat_key

Example

Update1.

For concatenated ranges (start_date=next day from end_date =end_date+1day) I'll add some checkings. (for example case with ClaimId 1842)

First CTE ndata unite rows with the same start_date to 1 row.

Base part of recursion query r selects first row from group of intersected or concatenated rows.
Recursive part of query consecutively combines all other rows from this group.

Corrected query

with ndata as(
select ClaimID,dim_legalstat_key,order_start_date
  ,max(order_end_date)order_end_date
  ,datediff(d,order_start_date,max(order_end_date))days_committed
  ,row_number()over(partition by ClaimId order by order_start_date) rn 
from #legal_data
group by ClaimID,dim_legalstat_key,order_start_date
)
,r as(
select rn headrow,rn,1 lvl,ClaimID,dim_legalstat_key
       ,order_start_date,order_end_date,days_committed
       ,cast(rn as varchar(1000)) rowlist
from ndata t1
where not exists(
  select * from ndata t2 
  where t2.ClaimId=t1.ClaimId and t2.rn<>t1.rn
    and t2.dim_legalstat_key=t1.dim_legalstat_key
    and 
       ( 
        (  t2.order_start_date<=t1.order_start_date 
            and t2.order_end_date>=t1.order_start_date   
        )
        or
        (   -- concatenated date ranges 
           dateadd(day,1,t2.order_end_date)=t1.order_start_date
        )
       )
  )
union all
select r.headrow,t2.rn,r.lvl+1 lvl,r.ClaimID,t2.dim_legalstat_key
       ,case when r.order_start_date<t2.order_start_date then r.order_start_date
        else t2.order_start_date end order_start_date
       ,case when r.order_end_date>t2.order_end_date then r.order_end_date
        else t2.order_end_date end order_end_date
       ,r.days_committed
       ,cast(concat(r.rowlist,',',cast(t2.rn as varchar)) as varchar(1000))rowlist
from r inner join ndata t2 
  on  t2.ClaimId=r.ClaimId  and t2.dim_legalstat_key=r.dim_legalstat_key
    and r.rn<>t2.rn
    and charindex(','+cast(t2.rn as varchar)+',',','+r.rowlist+',')=0
    and t2.order_start_date>=r.order_start_date
    and t2.order_start_date<=dateadd(day,1,r.order_end_date) -- concatenated date ranges
 and lvl<100  
)
--  select * from r;
select ClaimID,dim_legalstat_key
  ,min(order_start_date) order_start_date
  ,max(order_end_date)order_end_date
  ,datediff(d,min(order_start_date),max(order_end_date)) days_committed
from r
group by ClaimID,dim_legalstat_key,headrow
order by ClaimID,dim_legalstat_key

Example here

ValNik
  • 1,075
  • 1
  • 2
  • 6
  • i need assistance if the the previous end_date is just a day from the current start_date e.g ClaimID=1842 as I added to the original question. – Drdre01 Jul 24 '23 at 04:45
  • can you please review my result edit. Thanks – Drdre01 Jul 26 '23 at 22:19
  • for id 1553 - ``should produce a separate row`` - don't understand why. ('1553','4','2022-01-14','2023-02-12','394') intersects with ,('1553','4','2022-02-14','2023-03-23','402') – ValNik Jul 26 '23 at 22:48
  • yes, the 1-day gap is considered continuous, previous end date >= start date is continuous, but for the added (1553) record the previous end date 2023-02-12 is 2 days behind the current start date 2023-02-14. Hence anything 1 day apart is not considered continuous – Drdre01 Jul 27 '23 at 03:55
  • start date is ``'2022-02-14'``, not ``'2023-02-14'`` – ValNik Jul 27 '23 at 13:48
  • yes, it is supposed to be '2022-02-14.' that was an error when I inserted the date. – Drdre01 Jul 27 '23 at 14:47