0

Take two...

Actual data:

division    ID          date            flag
ABC123      ZZZ123      1/17/2013       Y
ABC123      ZZZ123      1/25/2013       N
ABC123      ZZZ123      2/22/2013       Y
ABC123      ZZZ123      2/26/2013       N
ABC123      YYY222      3/20/2013       Y
ABC123      YYY222      5/17/2013       N
XYZ456      ZZZ999      1/15/2012       N
XYZ456      ZZZ999      1/30/2012       N
XYZ456      ZZZ123      2/09/2012       N
XYZ456      ZZZ123      4/13/2012       Y
XYZ456      ZZZ123      6/23/2012       N
XYZ456      ZZZ123      10/5/2012       Y
XYZ456      ZZZ123      11/18/2012      N

I need to build a new column, ORDER_group, that will populate based on the following rules:

  • Each division and ID combination is considered a "group", sorted by date, and should have an ORDER_group (starting with 1) assigned to it.
  • Each time a "group" encounters a flag of "Y", it should increment the ORDER_group by 1.
  • If the "group" starts (first record with the earliest date) with a flag = "N", it should still start with ORDER_group = 1.
  • If the "group" starts (first record with the earliest date) with a flag = "Y", it should still start with ORDER_group = 1.
  • Each subsequent record should be the same ORDER_group number, unless a new "group" (division/ID) is encountered, at which, it should reset back to 1, or the next flag = "Y" is encountered.

Expected results:

division    ID          date            flag    ORDER_group
ABC123      ZZZ123      1/17/2013       Y       1
ABC123      ZZZ123      1/25/2013       N       1
ABC123      ZZZ123      2/22/2013       Y       2
ABC123      ZZZ123      2/26/2013       N       2
ABC123      YYY222      3/20/2013       Y       1
ABC123      YYY222      5/17/2013       N       1
XYZ456      ZZZ999      1/15/2012       N       1
XYZ456      ZZZ999      1/30/2012       N       1
XYZ456      ZZZ123      2/09/2012       N       1
XYZ456      ZZZ123      4/13/2012       Y       2
XYZ456      ZZZ123      6/23/2012       N       2
XYZ456      ZZZ123      10/5/2012       Y       3
XYZ456      ZZZ123      11/18/2012      N       3

Ideally this should be accomplished without a loop/cursor, unless there are performance reasons with CTE/temp tables. What is the best way to populate this new column?

Any help would be greatly appreciated.

SQL Fiddler for Actual data: http://sqlfiddle.com/#!3/5cca0/2

user2162331
  • 99
  • 2
  • 12

2 Answers2

1

So here a way to do it. It based on How do I calculate a running total in SQL without using a cursor? which does have some flaws. I'm using an index on the advice that it makes the ordering work out DESPITE the fact that order on the update is not guaranteed.

And it also worth pointing you to Calculate running total / running balance for Aaron Bertrand treatment.

The possibly clever bit here is the conversion of Y/N to 1/0 for use in calculating.

CREATE TABLE Orders (division CHAR(6),ID CHAR(6),dat DATETIME, flag CHAR(1))
INSERT INTO Orders VALUES

('ABC123','ZZZ123','01/17/2013','Y')
,('ABC123','ZZZ123','01/25/2013','N')
,('ABC123','ZZZ123','01/25/2013','N')
,('ABC123','ZZZ123','01/25/2013','N')
,('ABC123','ZZZ123','01/25/2013','N')
,('ABC123','ZZZ123','02/22/2013','Y')
,('ABC123','ZZZ123','02/26/2013','N')
,('ABC123','YYY222','03/20/2013','Y')
,('ABC123','YYY222','05/17/2013','N')
,('XYZ456','ZZZ999','01/15/2012','N')
,('XYZ456','ZZZ999','01/30/2012','N')
,('XYZ456','ZZZ123','02/09/2012','N')
,('XYZ456','ZZZ123','04/13/2012','Y')
,('XYZ456','ZZZ123','06/23/2012','N')
,('XYZ456','ZZZ123','010/5/2012','Y')
,('XYZ456','ZZZ123','11/18/2012','N')


CREATE TABLE #Orders (division CHAR(6),    ID CHAR(6),   dat DATETIME, flag CHAR(1),flag_int INTEGER, rn BIGINT, OrderGroup INT)

CREATE CLUSTERED INDEX IDX_C_Temp_Order ON #Orders(division, id,rn)

INSERT INTO #Orders (division, id,dat,flag,flag_int,rn,OrderGroup)
SELECT division
      ,ID
      ,dat
      ,flag
      ,CASE flag WHEN 'y' THEN 1 ELSE 0 END flag_int
      ,ROW_NUMBER() OVER (PARTITION BY division, id ORDER BY dat) rn
      ,0 OrderGroup
  FROM Orders

DECLARE @OrderGroup INT = 0
UPDATE #Orders
   SET @OrderGroup = OrderGroup  = CASE WHEN rn = 1 THEN 1 ELSE @OrderGroup + flag_int END
  FROM #Orders

SELECT * 
  FROM #Orders
 ORDER BY division
         ,ID
         ,rn

DROP TABLE #Orders
Community
  • 1
  • 1
Karl Kieninger
  • 8,841
  • 2
  • 33
  • 49
  • This does appear to work. I'll run through some bigger tests, but thanks in advance. I'm not sure why I was trying to over complicate it by using a CTE, when I could have used a few temp tables to accomplish it like you did. – user2162331 Apr 11 '14 at 04:46
  • There are recursive `CTE`s and non-recursive. The latter can perform the role of a temp table which exists for the life the the query and no longer. – Michael Green Apr 11 '14 at 05:20
  • CTE aren't usually over complicated. I can index a CTE so it not a perfect substitute here. Other than that I suppose I could maybe have UPDATE..OUTPUT a CTE for the same result. – Karl Kieninger Apr 11 '14 at 11:19
0

How about a simpler solution?

WITH Data AS (
    SELECT
        *,
        Num = Row_Number() OVER (PARTITION BY division, ID ORDER BY date)
    FROM MyTable
)
SELECT
    *
FROM
    Data D
    CROSS APPLY (
        SELECT Count(*)
        FROM Data D2
        WHERE
            D.division = D2.division
            AND D.ID = D2.ID
            AND D.date >= D2.date
            AND (D2.flag = 'Y' OR D2.Num = 1)
    ) G (OrderGroup)
;

See it live in a SQL Fiddle

ErikE
  • 48,881
  • 23
  • 151
  • 196