1

I have a question quite similar to How to add sequence number for groups in a SQL query without temp tables

To take the sample from there, I have a data-structure that consists of the first two columns of this table, and I want to generate the third column in my query:

Record  Group     GroupSequence
-------|---------|--------------
1       Chickens  1
2       Chickens  1
3       Horses    2
4       Cows      3
5       Horses    4
6       Horses    4

The difference to the quoted query is, that I need a new number if the previous line didn't have the same value in the second column.

I tried adding a row-number and using LAG() to check if the previous line had the same value - if so, take the previous value of this generated column, if not, take the row-number - but it looks like you can't reuse the column you're building.

The value just needs to be a different number - it doesn't matter if it's in order. This would also be fine:

Record  Group     GroupSequence
-------|---------|--------------
1       Chickens  1
2       Chickens  1
3       Horses    3
4       Cows      4
5       Horses    5
6       Horses    5
Community
  • 1
  • 1
SimonSimCity
  • 6,415
  • 3
  • 39
  • 52
  • why not 2 for horses in second ouput? – TheGameiswar Aug 04 '16 at 07:21
  • @TheGameiswar If I get 2 or 3 for `Horses` doesn't matter to me. It just needs to be a different number if the previous line doesn't have the same value. What matters is, that the second group of `Horses` has a different number than the first group. – SimonSimCity Aug 04 '16 at 07:24
  • 1
    Do some reading on RANK() and DENSE_RANK() functions – PacoDePaco Aug 04 '16 at 07:33

1 Answers1

2

I think this is what you need:

WITH Src AS
(
    SELECT * FROM (VALUES 
    (1, 'Chickens'),
    (2, 'Chickens'),
    (3, 'Horses  '),
    (4, 'Cows    '),
    (5, 'Horses  '),
    (6, 'Horses  '))T(Record, [Group])
), Differentiator AS
(
    SELECT *,
    ROW_NUMBER() OVER (ORDER BY Record) -
        RANK() OVER (PARTITION BY [Group] ORDER BY Record) Diff
    FROM Src
)
SELECT Record, [Group], DENSE_RANK() OVER (ORDER BY [Group],Diff) NewGroup
FROM Differentiator
ORDER BY Record

It produces following table:

Record   Group      NewGroup
------   -----      --------
1        Chickens   1
2        Chickens   1
3        Horses     3
4        Cows       2
5        Horses     4
6        Horses     4

Short explanation:

The key is to calculate relative positions of records in whole table and in each '[Group]' group. If records are adjacent, global number is increasing by 1 and local number is increased by 1. Thus, ROW_NUMBER() - RANK() is the same for all records. If there is a gap, there is also distortion in global numbering. It leads to different numbers generated by ROW_NUMBER() - RANK() in separated groups.

Paweł Dyl
  • 8,888
  • 1
  • 11
  • 27
  • it just blew my mind. All this tricks with numbers you're dealing with here ... Can you please explain how it works? Just to wrap my head around it and to prove that there won't be a wrong number in the `NewGroup` column ever? – SimonSimCity Aug 04 '16 at 07:54
  • Got it. This is pretty clever. I first thought that it would at some point have a conflict again, but as Diff increases as Record increases, there won't ever be a group-mismatch (like having one number for two different groups in the result). – SimonSimCity Aug 04 '16 at 08:31
  • 1
    I added short explanation and `ROW_NUMBER` to handle gaps in `Record` column. – Paweł Dyl Aug 04 '16 at 08:32
  • How is this different from `WITH Src AS ( SELECT * FROM (VALUES (1, 'Chickens'), (2, 'Chickens'), (3, 'Horses '), (4, 'Cows '), (5, 'Horses '), (6, 'Horses '))T(Record, [Group]) ) SELECT Record, [Group], DENSE_RANK() OVER (ORDER BY [Group]) NewGroup FROM Src ORDER BY Record` in other words - ommiting the Differentiator CTE. – andowero May 22 '20 at 07:07