1

This is a follow-up question to my initial post

Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity, making each unique).

I would like to assign a section number to each due date grouping where the due date remains the same chronologically.

Stu's response solved the table in my initial post, but I've noticed that if I replace the 4/15/2022 due date associated with SysLogDate of 1/16/2022 to be 4/13/2022, the desired ordering does not seem to be maintained:

Note: 4/13/2022 date is an arbitrary change. The same issue occurs if I use any other unique date that is not yet already in the DueDate column. Ultimately, I also need to be able to handle changes to/from NULL, where someone 'forgets' to enter the date, but replacing the date with NULL also yields the same issue.

Updated Table:

CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/13/2022',   '1/16/2022')  -- Due Date Altered since prior post
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022') 

Desired Results Are:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3
    1       2022-04-15      2022-01-17      4
    1       2022-04-10      2022-01-18      5
    1       2022-04-10      2022-01-19      5
    1       2022-04-10      2022-01-20      5
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

...but applying the solution from my prior post to this updated table yields:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3 **
    1       2022-04-15      2022-01-17      3 **
    1       2022-04-10      2022-01-18      3 **
    1       2022-04-10      2022-01-19      3 **
    1       2022-04-10      2022-01-20      3 **
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

Here's a demo to work that uses the above updated table and the solution from my prior post, and shows the above non-desired results: Fiddle

Demo showing same effect when the date is replaced with NULL: Fiddle with NULL

Copy of the selected solution from my prior post (used in the above Fiddles):

select OrderNo, DueDate, SysLogDate, 
      dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
    from (
      select *,
        Row_Number() over(partition by OrderNo order by SysLogDate)
        - Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
      from #DueDates
    )t
    order by OrderNo, SysLogDate;

It's a small change in the data, but I haven't been able to work out how to alter the 'Row_Number difference line' in the subquery to get the desired results.

Thank you for any advice you can offer here :)

Edit: In hindsight, I realized the the section number field would be more appropriately named "SectionNumber_WithinOrder" instead of "SectionNumber_WithinDueDate," but I've left things as-is to keep the names aligned with the code samples that were proposed.

jn4248
  • 105
  • 5

1 Answers1

1

For gap and island problem, I prefer to use lag() window function as it is easier to understand.

Use lag() to compare previous row value and when changed, set a flag (value 1). Perform a cumulative sum on the flag and you get the grp. Use dense_rank() on the grp and it gives you your SectionNumber_WithinDueDate

As you have NULL value, use ISNULL() to return a date value (99991231) for comparison

select OrderNo, DueDate, SysLogDate, 
       SectionNumber_WithinDueDate = dense_rank() over (partition by OrderNo 
                                                            order by grp)
from
(
    select *, grp = sum(g) over (partition by OrderNo 
                                     order by SysLogDate)
    from   
    ( 
           select *,
                  g = case when isnull(DueDate, '99991231')
                           <>   isnull(lag(DueDate) over (partition by OrderNo 
                                                              order by SysLogDate), '99991231')
                           then 1
                           else 0
                           end
           from   #DueDates
   ) d
) d
order by OrderNo, SysLogDate;

Fiddle on your sample data :

fiddle 1

fiddle 2

Squirrel
  • 23,507
  • 4
  • 34
  • 32
  • Thank you Squirrel, This is great! I agree this works for both table scenarios, and I think the usage of "Lag" helps visualize what's going on in the data. Just to make sure I understood the logic, I mapped it out as sequential CTE statements, which also yielded the same results. Just another way to look at it, I suppose, but I appreciate the compactness of your solution. [Fiddle 1}(https://dbfiddle.uk/2RtSmKTm) , [Fiddle 2 - With Null](https://dbfiddle.uk/oBiryTVW) – jn4248 Nov 10 '22 at 23:36
  • yes. both query are essentially the same. Using CTE does makes it easier to break down the query and inspect it at every stage – Squirrel Nov 11 '22 at 01:12