1

I have a SQL question. Below is my data and query

select ID from table

ID

4   
4   
5   
3   
5   
3   
3

What query should I add to my select list so that it gives me the occurrence of a value for each specfic row(I do not want the total count, I want something like a rank of occurrence).

4 1--this is the first time we got a 4 in the list, so 1
4 2--this is the second time we got a 4 in the list, so 2
5 1--this is the first time we got a 5 in the list, so 1
3 1--this is the first time we got a 3 in the list, so 1
5 2--this is the second time we got a 5 in the list, so 2
3 2--this is the second time we got a 3 in the list, so 2
3 3--this is the third time we got a 3 in the list, so 3
Felix Pamittan
  • 31,544
  • 7
  • 41
  • 67
  • might be similar to [mode calculation][1] [1]: http://stackoverflow.com/questions/11223003/mode-calculation-without-a-subquery-field-in-mysql – Francis Fuerte Jan 30 '15 at 02:57
  • That's for MySQL, I don't think any of that can help here. :( Steve has already nailed it. – Andrew Jan 30 '15 at 03:09
  • It's not quite that simple. `row_number()` is the right direction, but to match the expected output, the trick is to preserve the original row order. – Paul Williams Jan 30 '15 at 03:26

2 Answers2

2

You want ROW_NUMBER():

SELECT 
    [Id],
    ROW_NUMBER() OVER (PARTITION BY [Id] ORDER BY [Id])
FROM ...

You'll probably need to change the Order By clause to get the right output.

Steve Dowling
  • 1,919
  • 1
  • 13
  • 18
  • 1
    By itself, `row_number() over (partition by id order by id)` will not preserve the original row order. If that matters, you would have to generate an order somehow. Note that `row_number() over (partition by id order by (select 1))` also loses the original ordering. (I am testing on SQL Server 2014 in 2012 compatibility.) – Paul Williams Jan 30 '15 at 03:19
  • That's why I mentioned that the Order By clause would need to be changed. I'm guessing that there are other fields in the table that could be used to set the order. – Steve Dowling Jan 30 '15 at 03:37
  • Thank you so much Paul and Steve. I tried both row_number() and rank() but I messed up with the order by. SO, I was not able to get the results. Both the answers work for me. – Manasa Deepthi Jan 30 '15 at 13:21
1

To generate a numbering from an order of rows, use row_number() or rank(). The problem is, row_number() does not guarantee the original order will be preserved. You could try this:

select
    [id],
    row_number() over (partition by id, order by (select 1)) as [rank]
from @t

However, you will find the results are not in the original order and somewhat confusing:

id  rank
3   1
3   2
3   3
4   1
4   2
5   1
5   2

To preserve the original row order, you could build a temporary table or table variable with an identity column. Select a row_number() partitioned by id from there:

declare @t table ([tkey] int identity(1,1) primary key clustered, [id] int)
insert into @t (id) values (4), (4), (5), (3), (5), (3), (3)

select
    [id],
    row_number() over (partition by [Id] order by [tkey]) as [rank]
from @t
order by [tkey]

Note that the final order by [tkey] really is necessary. The query has the desired results:

id  rank
4   1
4   2
5   1
3   1
5   2
3   2
3   3

Here is a Common Table Expression (CTE) approach. The CTE adds a row_number() to maintain the rows in the original order. (This is the equivalent of the identity column in the previous example.) The actual ranking comes with the second row_number() when it does a partition by id. This causes the 1st 4 gets 1, the 2nd 4 gets 2, etc.

The second row_number() must be ordered by the original order to rank correctly, but this is still not enough to preserve the order in the output. A final order by ensures the ending order is the same.

declare @t table (id int)
insert into @t (id) values (4), (4), (5), (3), (5), (3), (3)

;with [tRows] (rownum, id) as
(
    select
        row_number() over (order by (select 1)) as [rownum],
        [id]
    from @t
)
select
    [id],
    row_number() over (partition by id order by [rownum]) as [rank]
from [tRows]
order by [rownum]

This query also has the desired results:

id  rank
4   1
4   2
5   1
3   1
5   2
3   2
3   3

In this example, you could use rank() instead of the second row_number(). The difference between the functions is explained well in this question. rank() would not work the same if, somehow, there were duplicate row numbers generated by the first row_number(), but that cannot happen.

Community
  • 1
  • 1
Paul Williams
  • 16,585
  • 5
  • 47
  • 82
  • Thank you so much Paul. I tried both row_number() and rank() but I messed up with the order by. So, I was not able to get the results. – Manasa Deepthi Jan 30 '15 at 13:22