4

Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity).
I would like to assign a group or section to each due date grouping that exists continuous, chronologically, without changing the due date.

For example, if April 10 is entered as due date, later changed to April 15, and then changed back to April 10 again, this would be 3 distinct groups/sections. In a more complex model, the sectioning would allow deeper investigation as to why due date changes were entered.

Here is a sample table that has 2 orders, each with a similar set of due date changes.

CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/15/2022',   '1/16/2022')
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022')

The script should be able to assign a section/group to the due dates of both orders at the same time, as follows:

OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
1       2022-04-10      2022-01-10      1
1       2022-04-10      2022-01-11      1
1       2022-04-15      2022-01-15      2
1       2022-04-15      2022-01-16      2
1       2022-04-15      2022-01-17      2
1       2022-04-10      2022-01-18      3
1       2022-04-10      2022-01-19      3
1       2022-04-10      2022-01-20      3
2       2022-04-10      2022-02-16      1
2       2022-04-10      2022-02-17      1
2       2022-04-15      2022-02-18      2
2       2022-04-15      2022-02-20      2
2       2022-04-15      2022-02-21      2
2       2022-04-10      2022-02-22      3
2       2022-04-10      2022-02-24      3
2       2022-04-10      2022-02-26      3

Below is my initial attempt, using Dense_Rank():

Select *, Dense_Rank() OVER (Partition By OrderNo, DueDate Order By SysLogDate) as SectionNumber_WithinDueDate
From #DueDates

However, it groups all 4/10 due dates together within each order, instead of grouping them distinctly as two separate groups as above.

Incorrect result example:

OrderNo DueDate     SysLogDate      SectionNumber_WithinDueDate
1       2022-04-10  2022-01-10      1
1       2022-04-10  2022-01-11      2
1       2022-04-10  2022-01-18      3
1       2022-04-10  2022-01-19      4
1       2022-04-10  2022-01-20      5
1       2022-04-15  2022-01-15      1
1       2022-04-15  2022-01-16      2
1       2022-04-15  2022-01-17      3

Thank you for any ideas you may have!

Dale K
  • 25,246
  • 15
  • 42
  • 71
jn4248
  • 105
  • 5

3 Answers3

2

This essentially requires a gaps and islands solution, which typically is done by subtracting a partitioned sequence from a global sequence.

If you don't actually need the section number to be numbered sequentially and just really need to identify the separate groups you can disregard the outer select below.

select OrderNo, DueDate, SysLogDate, 
  dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
from (
  select *,
    Row_Number() over(partition by OrderNo order by SysLogDate)
    - Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
  from #DueDates
)t
order by OrderNo, SysLogDate;

Demo Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33
  • 1
    Selecting this as correct due to relative simplicity and the additional option for an even more concise solution that provides unique group identification if sequential group numbering is note necessary. Thank you to all. I've upvoted the responses by Ahmed and KumarHarsh as they also generated the desired results. Thank you to all! – jn4248 Aug 30 '22 at 01:28
1

Try the following:

Select B.OrderNo, B.DueDate, B.SysLogDate, 
       DENSE_RANK() Over (Partition By OrderNo Order By B.grp) SectionNumber_WithinDueDate
From
(
  Select D.OrderNo, D.DueDate, D.SysLogDate, 
        SUM(D.g) Over (Partition By D.OrderNo Order By D.SysLogDate) grp
  From(
        Select OrderNo, DueDate, SysLogDate,
               ABS(DATEDIFF(Day, DueDate, LAG(DueDate, 1, DueDate) Over (Partition By OrderNo Order By SysLogDate))) g
        From #DueDates
       ) D
) B

See a demo from db<>fiddle.

The DATEDIFF is used to check the difference between the current row 'DueDate' and the previous one (using LAG function), once there is a difference a new group edge is created.

The cumulative sum SUM(g) Over... is to define the groups from the group edges.

ahmed
  • 9,071
  • 3
  • 9
  • 22
1

dummy Data,

cREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/15/2022',   '1/16/2022')
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022')

Query,

WITH CTE1
     AS (SELECT d.*,
                CASE
                    WHEN d1.SysLogDate IS NULL
                    THEN d.OrderNo
                    ELSE CAST(d1.orderno AS DECIMAL(3, 1)) + 0.1
                END neworderno
         FROM #DueDates d
              OUTER APPLY
         (
             SELECT TOP 1 SysLogDate, 
                          orderno
             FROM #DueDates d1
             WHERE d.orderno = d1.OrderNo
                   AND d.DueDate < d1.DueDate
                   AND d.SysLogDate > d1.SysLogDate
             ORDER BY d1.SysLogDate DESC
         ) d1)
     SELECT OrderNo, 
            DueDate, 
            SysLogDate, 
            DENSE_RANK() OVER(PARTITION BY orderno
            ORDER BY neworderno, 
                     duedate) SectionNumber_WithinDueDate
     FROM CTE1;

                            drop table #DueDates

Please test this with other sample data,if it do not work then throw that sample data then I will change my logic accordingly.

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22