5

I have the following stored procedure which returns A, B, and the count in descending order. I am trying to use ROW_NUMBER, so I can page the records, but I want the first row number 1 to be the record with the highest count, so basically, if I return a table with 3 records and the count is 30, 20, 10, then row number 1 should correspond with count 30, row number 2 should correspond with count 20, and row number 3 should correspond with count 10. dbo.f_GetCount is a function that returns a count.

create procedure dbo.Test
as
@A nvarchar(300) = NULL,
@B nvarchar(10) = NULL
as

select @A = nullif(@A,'')
      ,@B = nullif(@B,'');

select h.A
      ,hrl.B
      ,dbo.f_GetCount(hrl.A,h.B) as cnt
from dbo.hrl
    inner join dbo.h
        on h.C = hrl.C
where(@A is null
      or h.A like '%'+@A+'%'
     )
     and (@B is null
          or hrl.B = @B
         )
group by hrl.B
        ,h.A
order by cnt desc;
iamdave
  • 12,023
  • 3
  • 24
  • 53
Xaisoft
  • 45,655
  • 87
  • 279
  • 432
  • Append a semicolon after @B = NULLIF(@B,'') if you want a WITH clause to be used in your statement. – Quassnoi Apr 01 '09 at 18:26

3 Answers3

6
WITH q AS
        (
        SELECT h.A, hrl.B,
              dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM dbo.hrl
        INNER JOIN dbo.h on h.C = hrl.C
        WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
          AND (@B IS NULL OR hrl.B = @B)
        GROUP BY hrl.B, h.A
        )
SELECT  q.*, ROW_NUMBER() OVER (ORDER BY cnt DESC) AS rn
FROM    q
ORDER BY rn DESC

To retrieve first 10 rows, use:

WITH q AS
        (
        SELECT h.A, hrl.B,
              dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM dbo.hrl
        INNER JOIN dbo.h on h.C = hrl.C
        WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
          AND (@B IS NULL OR hrl.B = @B)
        GROUP BY hrl.B, h.A
        )
SELECT  TOP 10 q.*, 
        ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
FROM    q
ORDER BY cnt DESC, A, B

To retrieve rows between 11 and 20, use:

SELECT  *
FROM    (
        WITH q AS
                (
                SELECT h.A, hrl.B,
                      dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM dbo.hrl
                INNER JOIN dbo.h on h.C = hrl.C
                WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
                  AND (@B IS NULL OR hrl.B = @B)
                GROUP BY hrl.B, h.A
                )
        SELECT  q.*, 
                ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
        FROM    q
        ) qq
WHERE rn BETWEEN 11 AND 20
ORDER BY cnt DESC, A, B
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Just curious, what exactly is the WITH for? Does it improve performance? – Xaisoft Apr 01 '09 at 18:17
  • In this case, it was simplier to copy and paste using with :) It's useful for CTE's, in other ways it behaves same as a subquery. – Quassnoi Apr 01 '09 at 18:21
  • I am going to update my procedure in my question because it was actually giving me an error saying I need to terminate my statement, so maybe you can update your answer when I update my question. Thanks so far. – Xaisoft Apr 01 '09 at 18:23
  • Append a semicolon to your previous statement, right before WITH. – Quassnoi Apr 01 '09 at 18:25
  • Ok, I ran it with a semicolon after the first select statement and I get two errors: Incorrect Syntax near (, the one after the WITH and Incorrect Syntax near the keyword ORDER – Xaisoft Apr 01 '09 at 18:37
  • Ok, I made the change and now I get the error Incorrect Syntax near keyword 'as'. The 'as' where it says ') AS q' – Xaisoft Apr 01 '09 at 18:45
  • I changed this also, copy the whole new query. – Quassnoi Apr 01 '09 at 18:45
  • What part did you change this time? – Xaisoft Apr 01 '09 at 18:48
  • lol, ok thanks, i was looking at it and trying to figure out what you changed. – Xaisoft Apr 01 '09 at 18:51
  • Where would I put the where clause? I did the one for Casper's, but it takes 8 seconds to return 10 records. – Xaisoft Apr 01 '09 at 19:04
  • @Xaisoft: You can put it either in the CTE expression or on the outer query between "from q" and "order by" – dance2die Apr 01 '09 at 19:09
  • Yes, I tried 'WHERE rn between 1 and 10', but it says rn is an invalid column name. – Xaisoft Apr 01 '09 at 19:10
  • I am actually trying to do what is in the following link so I can speed up paging for my GridView in asp.net: http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx – Xaisoft Apr 01 '09 at 19:15
  • If I do Top 10, it actually returns the records with the lowest count first because your result set returns the records in ascending order, so what I actually need is where rn between 1 and 10, but can't get that to work – Xaisoft Apr 01 '09 at 19:17
  • Just checked, my query returns top 10 HIGHEST counts, numbered from 1 to 10 in RN field. – Quassnoi Apr 01 '09 at 19:27
  • Yes that is right, it does match rn 1 with the highest count and rn 2 with the second highest, but it is displaying with with the cnt going in ascending order, so it the first row will have a cnt of 1 and an rn of 137 – Xaisoft Apr 01 '09 at 19:33
  • Any idea how I can implement the where clause? – Xaisoft Apr 01 '09 at 19:34
  • Did you put "ORDER BY cnt DESC" instead of "ORDER BY rn" – Quassnoi Apr 01 '09 at 19:44
  • Ok that was it, now all that is missing is the WHERE clause? Any ideas? – Xaisoft Apr 01 '09 at 19:47
  • WHERE clause for what? To limit by rn? If you didn't forget TOP 10, then the query will return only TOP 10 rows without additional clauses. – Quassnoi Apr 01 '09 at 19:51
  • For example, I want to return only the rows between 11 and 20 or between 21 and 30. – Xaisoft Apr 01 '09 at 19:51
  • @Quassnoi: Using the TOP syntax is incorrect for two reasons. The first is performance, you might actually cause a second ordering on the result set (in addition to ROW_NUMBER). – casperOne Apr 02 '09 at 13:12
  • @Quassnoi: The second is if you have two groups with the same count, the groups are not guaranteed to be returned in the same order between the ROW_NUMBER and ORDER BY operations. You have to have another field to uniquely identify the groups (which is why you should order on ROW_NUMBER) – casperOne Apr 02 '09 at 13:14
  • @casperOne: first, SQL Server is smart enough to see that ORDER BY and ROW_NUMBER() use the same sort order and not to sort twice. – Quassnoi Apr 02 '09 at 13:20
  • @casperOne: second, my second query does not use TOP at all, it orders the results already returned and filtered by ROW_NUMBER() – Quassnoi Apr 02 '09 at 13:22
  • @casperOne: and third, you'd better correct the same things in your own answer before criticizing and downvoting mine. – Quassnoi Apr 02 '09 at 13:23
  • @Quassnoi: While SQL Server might be smart enough, that is an implementation detail of the query optimizer, and general programming practice dictates you not rely on implementation details. And yes, you do use TOP, in the second query, to select the next 10 rows. – casperOne Apr 02 '09 at 13:24
  • @Quassnoi: Rather, I meant the first ten rows (in the second query). – casperOne Apr 02 '09 at 13:25
  • @Quassnoi: Neither of my recommendations suffers from any of the points I made out about your queries. I do not use the TOP syntax, and I'm always ordering on unique identifiers for the rows. – casperOne Apr 02 '09 at 13:28
  • @casperOne: Where exactly do you order on unique identitiers in "order by t.cnt desc"? – Quassnoi Apr 02 '09 at 13:30
  • @Quassnoi: Your statement about downvoting is meaningless and adds nothing to the conversation, nor was I out of line to downvote something that I believe to be in error (which is separate from my own ideas on how best address the question). – casperOne Apr 02 '09 at 13:30
  • @casperOne: And what makes you sure "ORDER BY t.cnt" will persist between the queries? – Quassnoi Apr 02 '09 at 13:31
  • @Quassnoi: The "order by t.cnt desc" is done just once in the context of the query and then assigned a unique number on which future orderings are performed. – casperOne Apr 02 '09 at 13:33
  • @Quassnoi: In your examples, you order on the count *twice* which leads to the possibility of incorrect ordering between the row number and the order by in the *same* query. – casperOne Apr 02 '09 at 13:33
  • @Quassnoi: Yes, you are right, my query will not maintain the same order between queries, but within the same query, it will, which is what yours are lacking due to the double ordering. – casperOne Apr 02 '09 at 13:34
  • @casperOne: you have two pages. To retrieve them, you issue two queries: BETWEEN 1 AND 10 and BETWEEN 11 AND 20. Let's assume that CNT is equal for all rows. What makes you sure the second query will not miss or double anything if you just order by CNT? – Quassnoi Apr 02 '09 at 13:36
  • Please show me where exactly there is such a possibility, given that TOP 10 does not rely on ROW_NUMBER() at all (in the 2nd query), and ORDER BY happens only AFTER filtering by ROW_NUMBER() (in the 2rd query) – Quassnoi Apr 02 '09 at 13:41
  • @Quassnoi: I've updated my post to show how you can guarantee order across multiple queries on the same set of data. To your first question, it won't miss or double anything, but it will not guarantee the order applied to ROW_NUMBER is the order applied to rows in the ORDER BY statement. – casperOne Apr 02 '09 at 13:44
  • @Quassnoi: That's an implementation detail (assuming all counts are equal, how items are ordered between the separate operations). As to your second question, it's the same answer. The way that items are ordered in a ROW_NUMBER operation and ORDER BY are not guaranteed to be the same. – casperOne Apr 02 '09 at 13:45
  • @Quassnoi: Because all counts are equal (in your example), the ORDER BY statement is not guaranteed to return those records in the same order as the ROW_NUMBER function. It should be noted that you have updated your answer to address the ordering across multiple queries. – casperOne Apr 02 '09 at 13:48
  • Why do you need the same order on ORDER BY and ROW_NUMBER()? For what purpose? If you use TOP, then the ROW_NUMBER() is not used at all. If you filter by ROW_NUMBER(), the the results will be ordered only AFTER they are filtered. Besides, i've updated my post to order on A and B along with CNT. – Quassnoi Apr 02 '09 at 13:50
  • @Quassnoi: You've edited your post to reflect that. My comments stemmed from the older version of your queries which suffered from the issues I pointed out. Since you have corrected the issues I've removed the downvote, but they WERE issues (otherwise, why would you have edited it? =) – casperOne Apr 02 '09 at 13:55
  • Sure they were. What I initially wanted to say my query was no worse than yours :) – Quassnoi Apr 02 '09 at 13:59
3

I would use a sub-query to get the values of the function into the result, and then the ROW_NUMBER ranking function, like so:

select
    ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
from
    (
        SELECT
            h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM
            dbo.hrl
                INNER JOIN dbo.h on h.C = hrl.C
        WHERE 
            (@A IS NULL OR h.A like '%' + @A + '%') AND 
            (@B IS NULL OR hrl.B = @B)
        GROUP BY
            hrl.B, h.A
    ) as t
order by
    1

If you wanted only a certain section of results (say, for paging), then you would need another subquery, and then filter on the row number:

select
    t.*
from
    (
        select
            ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
        from
            (
                SELECT
                    h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM
                    dbo.hrl
                        INNER JOIN dbo.h on h.C = hrl.C
                WHERE 
                    (@A IS NULL OR h.A like '%' + @A + '%') AND 
                    (@B IS NULL OR hrl.B = @B)
                GROUP BY
                    hrl.B, h.A
            ) as t
    ) as t
where
    t.RowId between 1 and 10
order by
    t.RowId

Note that in this query, you could put ROW_NUMBER anywhere in the select list, since you are no longer reliant on using the "order by 1" syntax for the order by statement.

There is a subtle issue here when calling this query multiple times. It is not guaranteed that the order in which the records are returned are going to be consistent if the number of items in each group is not unique. In order to address this, you have to change the ROW_NUMBER function to order on the fields that make up the group in the count.

In this case, it would be A and B, resulting in:

select
    t.*
from
    (
        select
            ROW_NUMBER() over (order by t.cnt desc, t.A, t.B) as RowId, t.*
        from
            (
                SELECT
                    h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM
                    dbo.hrl
                        INNER JOIN dbo.h on h.C = hrl.C
                WHERE 
                    (@A IS NULL OR h.A like '%' + @A + '%') AND 
                    (@B IS NULL OR hrl.B = @B)
                GROUP BY
                    hrl.B, h.A
            ) as t
    ) as t
where
    t.RowId between 1 and 10
order by
    t.RowId

This ends up ordering the results consistently between calls when the count of the items between groups is not unique (assuming the same set of data).

casperOne
  • 73,706
  • 19
  • 184
  • 253
  • So far, yours has worked, but I am still curious about the order by 1 and if I wanted to return rows 1 thru 10, where would I put the where clause? – Xaisoft Apr 01 '09 at 18:25
  • @Xiasoft: To order by the first field. It is possible that you will have counts that are the same. In order to get around that, you have to put more order fields in the over (order by...) section. This will affect the row_number, which is the only thing you should order on (hence the 1). – casperOne Apr 01 '09 at 18:28
  • So 1 is short t-sql syntax for basically saying order by RowId – Xaisoft Apr 01 '09 at 18:36
  • No, it's a shortcut for saying "order by first field in the SELECT list" – Quassnoi Apr 01 '09 at 18:38
  • Ok, in this case, isn't the first field, RowId? – Xaisoft Apr 01 '09 at 18:39
  • Why are you performing a subselect here...? – Adam Robinson Apr 01 '09 at 18:39
  • And unless I've missed something fundamental (entirely possible), "order by 1" is a meaningless statement. It's going to order by a constant value of 1 for every row, meaning that it will not perform any ordering at all. – Adam Robinson Apr 01 '09 at 18:40
  • @Adam: It is a valid syntax in SQL Server. It means ordering by the column index. – dance2die Apr 01 '09 at 18:42
  • @Adam Robinson: Subselect is to prevent the function from possibly being called twice, since you would have to put it in the ranking function and on it's own. It's true the optimizer might only make the call once, but that's more of an implementation detail that should not be relied on. – casperOne Apr 01 '09 at 18:43
  • @Adam Robinson: You are incorrect about the order by. The "order by 1" as Quassnoi indicates means order by the first field in the SELECT list, in this case, the RowId, but I don't believe I can reference RowId in the order by, since it is a calculated value (I could be wrong here). – casperOne Apr 01 '09 at 18:44
  • Caspers procedure works so far, but I am even more confused now. – Xaisoft Apr 01 '09 at 18:47
  • I stand corrected, it does indeed to that. Still seems like a rather obtuse practice. – Adam Robinson Apr 01 '09 at 18:48
  • @Xaisoft: ROWID is a reserved word in Oracle. For an Oracle user "order by ROWID" means "keep implicit order" as the rows are returned in ROWID order from table scans. This confused me when I answered. In your case, yes, ORDER BY 1 and ORDER BY RowID are synonyms. – Quassnoi Apr 01 '09 at 18:48
  • @Xaisoft: See my updated post for how to filter based on the result of ROW_NUMBER. – casperOne Apr 01 '09 at 18:49
  • Casper, I added your changes, but it takes a while to return 10 records. Without the where clause, it actually runs faster when returning 137 records. – Xaisoft Apr 01 '09 at 19:07
  • @Xiasoft: When using the ROW_NUMBER ranking function, it requires a full processing of the items in the table, since they have to be ordered. You might want to consider using temp tables and indexes on those tables if you have a large set of data you need to order like this. – casperOne Apr 01 '09 at 19:16
  • ah, i was afraid of that, i thought temp ROW_NUMBER was gonna help the user get away from temp tables as noted in this post: http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx – Xaisoft Apr 01 '09 at 19:19
0
SELECT h.A, hrl.B,
       dbo.f_GetCount(hrl.A,h.B) as cnt,
ROW_NUMBER() over (order by cnt desc) as row_num
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
  AND (@B IS NULL OR hrl.B = @B)
GROUP BY hrl.B, h.A
ORDER BY cnt desc

This should do the trick. I don't have SSMS in front of me to test, but you MAY have to substitute the usage of 'cnt' in the ROW_NUMBER's order by clause with a second call to the function, but this should give you the general idea.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343