19

We have appointment table as shown below. Each appointment need to be categorized as "New" or "Followup". Any appointment (for a patient) within 30 days of first appointment (of that patient) is Followup. After 30 days, appointment is again "New". Any appointment within 30 days become "Followup".

I am currently doing this by typing while loop.
How to achieve this without WHILE loop?

enter image description here

Table

CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT  1,101,'2020-01-05' UNION
SELECT  2,505,'2020-01-06' UNION
SELECT  3,505,'2020-01-10' UNION
SELECT  4,505,'2020-01-20' UNION
SELECT  5,101,'2020-01-25' UNION
SELECT  6,101,'2020-02-12'  UNION
SELECT  7,101,'2020-02-20'  UNION
SELECT  8,101,'2020-03-30'  UNION
SELECT  9,303,'2020-01-28' UNION
SELECT  10,303,'2020-02-02' 
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
LCJ
  • 22,196
  • 67
  • 260
  • 418
  • I can't see your image, but I want to confirm, if there are 3 appointments, each 20 days from each other, the last one is still 'follow up' right, because even though it's more than 30 days from the first, it's still less than 20 days from the middle. Is this true? – pwilcox Feb 28 '20 at 19:15
  • @pwilcox No. The third one will be new appointment as shown in the image – LCJ Feb 28 '20 at 19:41
  • While loop over `fast_forward` cursor would probably be your best option, performance wise. – David דודו Markovitz Mar 04 '20 at 07:47

10 Answers10

14

You need to use recursive query.

The 30days period is counted starting from prev(and no it is not possible to do it without recursion/quirky update/loop). That is why all the existing answer using only ROW_NUMBER failed.

WITH f AS (
  SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY PatientId ORDER BY ApptDate) 
  FROM Appt1
), rec AS (
  SELECT Category = CAST('New' AS NVARCHAR(20)), ApptId, PatientId, ApptDate, rn, startDate = ApptDate
  FROM f
  WHERE rn = 1
  UNION ALL
  SELECT CAST(CASE WHEN DATEDIFF(DAY,  rec.startDate,f.ApptDate) <= 30 THEN N'FollowUp' ELSE N'New' END AS NVARCHAR(20)), 
         f.ApptId,f.PatientId,f.ApptDate, f.rn,
         CASE WHEN DATEDIFF(DAY, rec.startDate, f.ApptDate) <= 30 THEN rec.startDate ELSE f.ApptDate END
  FROM rec
  JOIN f
    ON rec.rn = f.rn - 1
   AND rec.PatientId = f.PatientId
)
SELECT ApptId, PatientId, ApptDate, Category
FROM rec
ORDER BY PatientId, ApptDate;  

db<>fiddle demo

Output:

+---------+------------+-------------+----------+
| ApptId  | PatientId  |  ApptDate   | Category |
+---------+------------+-------------+----------+
|      1  |       101  | 2020-01-05  | New      |
|      5  |       101  | 2020-01-25  | FollowUp |
|      6  |       101  | 2020-02-12  | New      |
|      7  |       101  | 2020-02-20  | FollowUp |
|      8  |       101  | 2020-03-30  | New      |
|      9  |       303  | 2020-01-28  | New      |
|     10  |       303  | 2020-02-02  | FollowUp |
|      2  |       505  | 2020-01-06  | New      |
|      3  |       505  | 2020-01-10  | FollowUp |
|      4  |       505  | 2020-01-20  | FollowUp |
+---------+------------+-------------+----------+

How it works:

  1. f - get starting point(anchor - per every PatientId)
  2. rec - recursibe part, if the difference between current value and prev is > 30 change the category and starting point, in context of PatientId
  3. Main - display sorted resultset

Similar class:

Conditional SUM on Oracle - Capping a windowed function

Session window (Azure Stream Analytics)

Running Total until specific condition is true - Quirky update


Addendum

Do not ever use this code on production!

But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds"

It could be done in "single" round(quirky update):

CREATE TABLE Appt_temp (ApptID INT , PatientID INT, ApptDate DATE, Category NVARCHAR(10))

INSERT INTO Appt_temp(ApptId, PatientId, ApptDate)
SELECT ApptId, PatientId, ApptDate
FROM Appt1;

CREATE CLUSTERED INDEX Idx_appt ON Appt_temp(PatientID, ApptDate);

Query:

DECLARE @PatientId INT = 0,
        @PrevPatientId INT,
        @FirstApptDate DATE = NULL;

UPDATE Appt_temp
SET  @PrevPatientId = @PatientId
    ,@PatientId     = PatientID 
    ,@FirstApptDate = CASE WHEN @PrevPatientId <> @PatientId THEN ApptDate
                           WHEN DATEDIFF(DAY, @FirstApptDate, ApptDate)>30 THEN ApptDate
                           ELSE @FirstApptDate
                      END
    ,Category       = CASE WHEN @PrevPatientId <> @PatientId THEN 'New'
                           WHEN @FirstApptDate = ApptDate THEN 'New'
                           ELSE 'FollowUp' 
                      END
FROM Appt_temp WITH(INDEX(Idx_appt))
OPTION (MAXDOP 1);

SELECT * FROM  Appt_temp ORDER BY PatientId, ApptDate;

db<>fiddle Quirky update

Community
  • 1
  • 1
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • 1
    your logic looks very similar to mine. Can you describe any significant differences? – pwilcox Mar 02 '20 at 17:19
  • @pwilcox When I wrote this answer every existing one was using simple row_number which was not working, this is why I provided my own version – Lukasz Szozda Mar 02 '20 at 17:20
  • Yeah, I was too quick with the answer. Thx for commenting on that. – Irdis Mar 02 '20 at 23:32
  • 2
    I believe rcte is the only solution for this until SQL server correctly implements `RANGE x PRECEDING` clause. – Salman A Mar 04 '20 at 09:23
  • PS: it is much simpler to move the New-Followup column _outside_ the rcte (compare apptdate = startdate => new). No cast required then. – Salman A Mar 04 '20 at 09:43
  • @Can you please explain why you say "Do not ever use this code on production"? – LCJ Mar 10 '20 at 16:44
  • 1
    @LCJ Quirky update is based on "undocumented" behaviour and it could change in any moment without notice(https://www.red-gate.com/simple-talk/sql/learn-sql-server/robyn-pages-sql-server-cursor-workbench/) – Lukasz Szozda Mar 10 '20 at 16:46
5

You could do this with a recursive cte. You should first order by apptDate within each patient. That can be accomplished by a run-of-the-mill cte.

Then, in the anchor portion of your recursive cte, select the first ordering for each patient, mark the status as 'new', and also mark the apptDate as the date of the most recent 'new' record.

In the recursive portion of your recursive cte, increment to the next appointment, calculate the difference in days between the present appointment and the most recent 'new' appointment date. If it's greater than 30 days, mark it 'new' and reset the most recent new appointment date. Otherwise mark it as 'follow up' and just pass along the existing days since new appointment date.

Finallly, in the base query, just select the columns you want.

with orderings as (

    select       *, 
                 rn = row_number() over(
                     partition by patientId 
                     order by apptDate
                 ) 
    from         #appt1 a

),

markings as (

    select       apptId, 
                 patientId, 
                 apptDate, 
                 rn, 
                 type = convert(varchar(10),'new'),
                 dateOfNew = apptDate
    from         orderings 
    where        rn = 1

    union all
    select       o.apptId, o.patientId, o.apptDate, o.rn,
                 type = convert(varchar(10),iif(ap.daysSinceNew > 30, 'new', 'follow up')),
                 dateOfNew = iif(ap.daysSinceNew > 30, o.apptDate, m.dateOfNew)
    from         markings m
    join         orderings o 
                     on m.patientId = o.patientId 
                     and m.rn + 1 = o.rn
    cross apply  (select daysSinceNew = datediff(day, m.dateOfNew, o.apptDate)) ap

)

select    apptId, patientId, apptDate, type
from      markings
order by  patientId, rn;

I should mention that I initially deleted this answer because Abhijeet Khandagale's answer seemed to meet your needs with a simpler query (after reworking it a bit). But with your comment to him about your business requirement and your added sample data, I undeleted mine because believe this one meets your needs.

pwilcox
  • 5,542
  • 1
  • 19
  • 31
4

I'm not sure that it's exactly what you implemented. But another option, that is worth mentioning besides using cte, is to use temp table and update in "rounds". So we are going to update temp table while all statuses are not set correctly and build result in an iterative way. We can control number of iteration using simply local variable.

So we split each iteration into two stages.

  1. Set all Followup values that are near to New records. That's pretty easy to do just using right filter.
  2. For the rest of the records that dont have status set we can select first in group with same PatientID. And say that they are new since they not processed by the first stage.

So

CREATE TABLE #Appt2 (ApptID INT, PatientID INT, ApptDate DATE, AppStatus nvarchar(100))

select * from #Appt1
insert into #Appt2 (ApptID, PatientID, ApptDate, AppStatus)
select a1.ApptID, a1.PatientID, a1.ApptDate, null from #Appt1 a1
declare @limit int = 0;

while (exists(select * from #Appt2 where AppStatus IS NULL) and @limit < 1000)
begin
  set @limit = @limit+1;
  update a2
  set
    a2.AppStatus = IIF(exists(
        select * 
        from #Appt2 a 
        where 
          0 > DATEDIFF(day, a2.ApptDate, a.ApptDate) 
          and DATEDIFF(day, a2.ApptDate, a.ApptDate) > -30 
          and a.ApptID != a2.ApptID 
          and a.PatientID = a2.PatientID
          and a.AppStatus = 'New'
          ), 'Followup', a2.AppStatus)
  from #Appt2 a2

  --select * from #Appt2

  update a2
  set a2.AppStatus = 'New'
  from #Appt2 a2 join (select a.*, ROW_NUMBER() over (Partition By PatientId order by ApptId) rn from (select * from #Appt2 where AppStatus IS NULL) a) ar
  on a2.ApptID = ar.ApptID
  and ar.rn = 1

  --select * from #Appt2

end

select * from #Appt2 order by PatientID, ApptDate

drop table #Appt1
drop table #Appt2

Update. Read the comment provided by Lukasz. It's by far smarter way. I leave my answer just as an idea.

Irdis
  • 919
  • 9
  • 16
4

I believe the recursive common expression is great way to optimize queries avoiding loops, but in some cases it can lead to bad performance and should be avoided if possible.

I use the code below to solve the issue and test it will more values, but encourage you to test it with your real data, too.

WITH DataSource AS
(
    SELECT *
          ,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001) AS [GroupID]
    FROM #Appt1
)
SELECT *
     ,IIF(ROW_NUMBER() OVER (PARTITION BY [PatientID], [GroupID] ORDER BY [ApptDate]) = 1, 'New', 'Followup')
FROM DataSource
ORDER BY [PatientID]
        ,[ApptDate];

enter image description here

The idea is pretty simple - I want separate the records in group (30 days), in which group the smallest record is new, the others are follow ups. Check how the statement is built:

SELECT *
      ,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate])
      ,DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30
      ,CEILING(DATEDIFF(DAY, MIN([ApptDate]) OVER (PARTITION BY [PatientID]), [ApptDate]) * 1.0 / 30 + 0.000001) 
FROM #Appt1
ORDER BY [PatientID]
        ,[ApptDate];

enter image description here

So:

  1. first, we are getting the first date, for each group and calculating the differences in days with the current one
  2. then, we are want to get groups - * 1.0 / 30 is added
  3. as for 30, 60, 90, etc days we are getting whole number and we wanted to start a new period, I have added + 0.000001; also, we are using ceiling function to get the smallest integer greater than, or equal to, the specified numeric expression

That's it. Having such group we simply use ROW_NUMBER to find our start date and make it as new and leaving the rest as follow ups.

gotqn
  • 42,737
  • 46
  • 157
  • 243
  • 2
    Well, the question is a bit different and this apporach is oversimplification. But it is a nice example how to implement **[tumbling window](https://learn.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics)** – Lukasz Szozda Mar 06 '20 at 15:41
  • It's about performance, too. I believe recursive should be slower. – gotqn Mar 06 '20 at 16:07
3

Although it's not clearly addressed in the question, it's easy to figure out that the appointment dates cannot be simply categorized by 30-day groups. It makes no business sense. And you cannot use the appt id either. One can make a new appointment today for 2020-09-06. Here is how I address this issue. First, get the first appointment, then calculate the date difference between each appointment and the first appt. If it's 0, set to 'New'. If <= 30 'Followup'. If > 30, set as 'Undecided' and do the next round check until there is no more 'Undecided'. And for that, you really need a while loop, but it does not loop through each appointment date, rather only a few datasets. I checked the execution plan. Even though there are only 10 rows, the query cost is significantly lower than that using recursive CTE, but not as low as Lukasz Szozda's addendum method.

IF OBJECT_ID('tempdb..#TEMPTABLE') IS NOT NULL DROP TABLE #TEMPTABLE
SELECT ApptID, PatientID, ApptDate
    ,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New' 
    WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
    ELSE 'Undecided' END AS Category
INTO #TEMPTABLE
FROM #Appt1

WHILE EXISTS(SELECT TOP 1 * FROM #TEMPTABLE WHERE Category = 'Undecided') BEGIN
    ;WITH CTE AS (
        SELECT ApptID, PatientID, ApptDate 
            ,CASE WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) = 0) THEN 'New' 
            WHEN (DATEDIFF(DAY, MIN(ApptDate) OVER (PARTITION BY PatientID), ApptDate) <= 30) THEN 'Followup'
            ELSE 'Undecided' END AS Category    
        FROM #TEMPTABLE
        WHERE Category = 'Undecided'
    )
    UPDATE #TEMPTABLE
    SET Category = CTE.Category
    FROM #TEMPTABLE t
        LEFT JOIN CTE ON CTE.ApptID = t.ApptID
    WHERE t.Category = 'Undecided'
END

SELECT ApptID, PatientID, ApptDate, Category 
FROM #TEMPTABLE
Weihui Guo
  • 3,669
  • 5
  • 34
  • 56
3

With due respect to everybody and in IMHO,

There is not much difference between While LOOP and Recursive CTE in terms of RBAR

There is not much performance gain when using Recursive CTE and Window Partition function all in one.

Appid should be int identity(1,1) , or it should be ever increasing clustered index.

Apart from other benefit it also ensure that all successive row APPDate of that patient must be greater.

This way you can easily play with APPID in your query which will be more efficient than putting inequality operator like >,< in APPDate. Putting inequality operator like >,< in APPID will aid Sql Optimizer.

Also there should be two date column in table like

APPDateTime datetime2(0) not null,
Appdate date not null

As these are most important columns in most important table,so not much cast ,convert.

So Non clustered index can be created on Appdate

Create NonClustered index ix_PID_AppDate_App  on APP (patientid,APPDate) include(other column which is not i predicate except APPID)

Test my script with other sample data and lemme know for which sample data it not working. Even if it do not work then I am sure it can be fix in my script logic itself.

CREATE TABLE #Appt1 (ApptID INT, PatientID INT, ApptDate DATE)
INSERT INTO #Appt1
SELECT  1,101,'2020-01-05'  UNION ALL
SELECT  2,505,'2020-01-06'  UNION ALL
SELECT  3,505,'2020-01-10'  UNION ALL
SELECT  4,505,'2020-01-20'  UNION ALL
SELECT  5,101,'2020-01-25'  UNION ALL
SELECT  6,101,'2020-02-12'  UNION ALL
SELECT  7,101,'2020-02-20'  UNION ALL
SELECT  8,101,'2020-03-30'  UNION ALL
SELECT  9,303,'2020-01-28'  UNION ALL
SELECT  10,303,'2020-02-02' 

;With CTE as
(
select a1.* ,a2.ApptDate as NewApptDate
from #Appt1 a1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2 
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID 
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)>30
order by a2.ApptID desc )A2
)
,CTE1 as
(
select a1.*, a2.ApptDate as FollowApptDate
from CTE A1
outer apply(select top 1 a2.ApptID ,a2.ApptDate
from #Appt1 A2 
where a1.PatientID=a2.PatientID and a1.ApptID>a2.ApptID 
and DATEDIFF(day,a2.ApptDate, a1.ApptDate)<=30
order by a2.ApptID desc )A2
)
select  * 
,case when FollowApptDate is null then 'New' 
when NewApptDate is not null and FollowApptDate is not null 
and DATEDIFF(day,NewApptDate, FollowApptDate)<=30 then 'New'
else 'Followup' end
 as Category
from cte1 a1
order by a1.PatientID

drop table #Appt1
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
2

I hope this will help you.

WITH CTE AS
(
    SELECT #Appt1.*, RowNum = ROW_NUMBER() OVER (PARTITION BY PatientID ORDER BY ApptDate, ApptID) FROM #Appt1
)

SELECT A.ApptID , A.PatientID , A.ApptDate ,
Expected_Category = CASE WHEN (DATEDIFF(MONTH, B.ApptDate, A.ApptDate) > 0) THEN 'New' 
WHEN (DATEDIFF(DAY, B.ApptDate, A.ApptDate) <= 30) then 'Followup' 
ELSE 'New' END
FROM CTE A
LEFT OUTER JOIN CTE B on A.PatientID = B.PatientID 
AND A.rownum = B.rownum + 1
ORDER BY A.PatientID, A.ApptDate
  • Thanks @x00 for editing code in readable format , I am using my cellphone to post answers so was not able to give proper indentations. – Abhijeet Khandagale Mar 02 '20 at 13:37
  • I think this is in essence the right answer. But it's a poor quality answer in that it is not explained and the code has a needless outer query when a modification of the inner portion will do fine. If you can resolve those issues I'll be happy to vote you up. – pwilcox Mar 02 '20 at 14:02
  • 1
    @pwilcox, thanks for the valuable suggestion, I've edited the answer and posted it as of now. As I'm travelling and I don't have laptop with me, I'll be posting explaination in a day or two. – Abhijeet Khandagale Mar 02 '20 at 16:32
  • 1
    @AbhijeetKhandagale This does not meet the business requirement completely. I have added a failing scenario in the question. For patient 303, Feb 2nd appointment should be Followup; but your query tells it is "New" – LCJ Mar 02 '20 at 16:39
1

You could use a Case statement.

select 
      *, 
      CASE 
          WHEN DATEDIFF(d,A1.ApptDate,A2.ApptDate)>30 THEN 'New' 
          ELSE 'FollowUp' 
      END 'Category'
from 
      (SELECT PatientId, MIN(ApptId) 'ApptId', MIN(ApptDate) 'ApptDate' FROM #Appt1 GROUP BY PatientID)  A1, 
      #Appt1 A2 
where 
     A1.PatientID=A2.PatientID AND A1.ApptID<A2.ApptID

The question is, should this category be assigned based off the initial appointment, or the one prior? That is, if a Patient has had three appointments, should we compare the third appointment to the first, or the second?

You problem states the first, which is how I've answered. If that's not the case, you'll want to use lag.

Also, keep in mind that DateDiff makes not exception for weekends. If this should be weekdays only, you'll need to create your own Scalar-Valued function.

user
  • 1,261
  • 2
  • 21
  • 43
  • 1
    This doesn't link two sequential appointments, this links appt 1 to all following appointments and calculates the days-between for all of them. You'd return too many records this way, as appt 1 now has a relation with 2, 3, 4, appt 2 has a relation with 3, 4 ... – steenbergh Feb 28 '20 at 19:28
  • 1
    It does not give expected result. Feb 20 appointment should be "Followup" – LCJ Mar 02 '20 at 04:01
  • The question's unclear... The poster description is this: "Any appointment (for a patient) within 30 days of *first* appointment (of that patient) is Followup. After 30 days, appointment is again "New". Any appointment witin 30 days become "Followup"." January 5th is certainly more than 30 days away from February 20th, i.e. New. It is NOT 30 days away from February 12th, however. I offer a solution to what he wrote, not the table supplied. If the user would like to align with what the table supplies, they should use lag. They should also clarify... – user Mar 02 '20 at 19:51
1

using Lag function


select  apptID, PatientID , Apptdate ,  
    case when date_diff IS NULL THEN 'NEW' 
         when date_diff < 30 and (date_diff_2 IS NULL or date_diff_2 < 30) THEN  'Follow Up'
         ELSE 'NEW'
    END AS STATUS FROM 
(
select 
apptID, PatientID , Apptdate , 
DATEDIFF (day,lag(Apptdate) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff ,
DATEDIFF(day,lag(Apptdate,2) over (PARTITION BY PatientID order by ApptID asc),Apptdate) date_diff_2
  from #Appt1
) SRC

Demo --> https://rextester.com/TNW43808

Digvijay S
  • 2,665
  • 1
  • 9
  • 21
  • 2
    This works on the current sample data but might yield the wrong results given a different sample data. Even if you use `apptDate` as the `order by` column of the `lag` function (which you really should as id is not a guarantee of anything), it can still be easily broken by introducing more followup appointments. See [this Rextester demo](https://rextester.com/PTEZR80440) for example. Good try, though... – Zohar Peled Mar 03 '20 at 09:40
  • Thank you. Should have used date instead of ID. But why it is wrong for apptID = 6 25.01.2020 - 12.02.2020 --> 18 days --> follow up. – Digvijay S Mar 03 '20 at 10:14
  • 2
    Because it should be a `New` and not a `FollowUp`. It's more than 30 days since the first appointment of that patient... You should count 30 days since each `New` appointment and then use a `New` again... – Zohar Peled Mar 03 '20 at 10:17
  • Yes. Thank you. :( Need to create a new to check valid period of date. – Digvijay S Mar 03 '20 at 10:40
1
with cte
as
(
select 
tmp.*, 
IsNull(Lag(ApptDate) Over (partition by PatientID Order by  PatientID,ApptDate),ApptDate) PriorApptDate
 from #Appt1 tmp
)
select 
PatientID, 
ApptDate, 
PriorApptDate, 
DateDiff(d,PriorApptDate,ApptDate) Elapsed,
Case when DateDiff(d,PriorApptDate,ApptDate)>30 
or DateDiff(d,PriorApptDate,ApptDate)=0 then 'New' else 'Followup'   end Category   from cte

Mine is correct. The authors was incorrect, see elapsed

Golden Lion
  • 3,840
  • 2
  • 26
  • 35