SQL Server rank query optimization

Question

I'm trying to optimize a query to get the rank of a user in specific country from the DB. Currently is seems very inefficient.

I'm trying to determine what can be done to improve it.

This is the current SQL:

SELECT COUNT(*) + 1
FROM leaderboard lb, users u
WHERE u.country = 'United States' 
  AND lb.id = u.id 
  AND lb.score + 1 > (SELECT lb2.score 
                      FROM leaderboard lb2
                      WHERE lb2.id = some_user_id);

Details:

Users table:

id is the primary key
country column is indexed

Leadeboard table:

id is the primary key
score column is indexed

The execution plan shows 2 warnings:

Columns With No Statistics: [dbo].[leaderboard].id (cost 23%)

Columns With No Statistics: [dbo].[users].id (cost 28%)

Could you post some DDL, Sample data and Expected result please? — Thom A, Dec 08 '17 at 15:55
Execution plan would help a lot on this too. And where's the rank function? — S3S, Dec 08 '17 at 15:55
This query is based on a query I already use and which works very good, SELECT @p_rank = COUNT(*) + 1 FROM leaderboard lb WHERE ur.score > (SELECT ur2.score FROM leaderboard lb2 WHERE ur2.id = some_id); I had to add a join for the country column. — SuperFrog, Dec 08 '17 at 15:58
What is `some_user_id`, why you need this filter on the third table? I don't know why you need this table `q360_leaderboard` at all. — Tim Schmelter, Dec 08 '17 at 15:59
Sorry, this was a typo, it's leaderboard table. I've edited the question. — SuperFrog, Dec 08 '17 at 16:05
If you share the execution plan we might be able to help with the missing statistics, which can resolve this issue. — artm, Jan 15 '18 at 00:29
What's the output for `select tblSS.name , tblSS.auto_created , tblSS.user_created , [index] = tblSI.name from sys.stats tblSS left outer join sys.indexes tblSI on tblSS.object_id = tblSI.object_id and tblSS.stats_id = tblSI.index_id where tblSS.object_id = object_id('dbo.leaderboard')` — artm, Jan 15 '18 at 01:03
PK_leaderboard 0 0 PK_leaderboard nci_wi_leaderboard_score 0 0 nci_wi_leaderboard_score nci_wi_leaderboard_daily 0 0 nci_wi_leaderboard_daily nci_wi_leaderboard_weekly 0 0 nci_wi_leaderboard_weekly Let me know if it's not readable and I will add a screen shot in the question. — SuperFrog, Jan 15 '18 at 01:09
Statitics are not auto created then. Can you try (at your own risk) creating these statistics and see if the query time improves? `CREATE STATISTICS [stat_leaderboard_id] ON dbo.leaderboard ( Id ) WITH FULLSCAN; CREATE STATISTICS [stat_users_id] ON dbo.users ( Id ) WITH FULLSCAN;` — artm, Jan 15 '18 at 01:11
If you have a developer/test environment try there first. Or backup/restore the database on the test server and try on the backup before production. SQL should create the statistics when it deems necesarry but if they haven't been auto-created, creating them manually shouldn't cause any problems but if the tables are large and creating them will be using tempDb, it might slow down the server. After creating the statistics and it fixes the issue, you'll probably also need to look at why they are not auto-created. — artm, Jan 15 '18 at 01:21
Two suggestions: (1) Create a _single_ index on `users` composed of `country` and `id` and one on `leaderboard` composed of `id` and `score`. (b) Change the `score` comparison so that you aren't performing arithmetic on the column, i.e. `... and lb.score >= ( select ... )` or `... and lb.score > ( select ... ) - 1`. (I'm assuming that `score` is an integer.) — HABO, Jan 16 '18 at 01:06
What if the `Some_User_id` does not belong to a user of the `'United States'` ... this query contains logical errors — Yahfoufi, Jan 16 '18 at 15:07
@Yahfoufi - you are correct, but please assume that there will be a correlation between the user and country — SuperFrog, Jan 16 '18 at 16:59
the sql statement you use in the question fails the logic. Here is an example with five users, the user with the id 1 has 75 points and should be on 3rd rank while your query results in ranking him on rank 5. Check this: http://rextester.com/YSKBO82261 — Raul, Jan 16 '18 at 22:22
@RaulSebastian I don't think it's because of the logic problem, but because of the count(*) + 1 — SuperFrog, Jan 16 '18 at 22:27
@UdiIdan it is not the `count(*) + 1`, it is the `score + 1` and that is a problem of implementation logic. — Raul, Jan 16 '18 at 22:32
Did using [covering indexes](http://www.dbadiaries.com/sql-server-covering-index-and-key-lookup/) make a difference in performance? ("Create a _single_ index on `users` composed of `country` and `id` and one on `leaderboard` composed of `id` and `score`.") Note that SQL Server 2005 and later versions support [included columns](http://msdn.microsoft.com/en-us/library/ms190806.aspx) as well as composite indexes. — HABO, Jan 17 '18 at 20:47

kjmerf · Answer 1 · 2018-01-16T17:20:02.117

5

I think you are looking for something like:

SELECT u.id AS user_id, u.country, lu.score, RANK() OVER (ORDER BY lu.score DESC) AS rnk
FROM users u
INNER JOIN leaderboard lu ON u.id = lu.id
WHERE u.country = 'United States'

You can see a demo here: http://rextester.com/KHM76159

edited Jan 16 '18 at 17:20

answered Dec 08 '17 at 16:58

kjmerf

4,275
3
21
29

Thanks, but it seems this query will return the rank as 1 no matter who is the user. – SuperFrog Dec 09 '17 at 21:41
1

@UdiIdan you are using it wrong then. See http://rextester.com/QWP15044 – Martin Smith Dec 09 '17 at 21:48
You are correct, it does work. Unfortunately it's slower than the original query. – SuperFrog Dec 09 '17 at 22:45
This is the correct response and should be marked as such. I checked the query plan and it outperforms your initial statement. I will post a link – Raul Jan 16 '18 at 10:15
@RaulSebastian - although you might be correct in theory, I did check this query on the server. Unfortunately it's much slower than the original (3X CPU time and 4X elapsed time). – SuperFrog Jan 16 '18 at 10:38
This query will return all users ranking from all countries, it is not an optimization of the OP query!! – Yahfoufi Jan 16 '18 at 14:54
1

Then there is no need to `PARTITION BY country` – Yahfoufi Jan 16 '18 at 16:12
Ok edit approved. Once you specify country below, the partition is not needed – kjmerf Jan 16 '18 at 17:19
@UdiIdan here is kbball query constrainted to a single userId compared with your original statement, with the schema as described in your question: http://rextester.com/PCP89821 I have tested it on Sql Server 2008R2, 2012 and 2016 all enterprise. It performs more than 20% faster from perspective of reads, cputime and the query plan is compact. – Raul Jan 16 '18 at 22:05

paparazzo · Answer 2 · 2018-01-20T13:55:33.843

2

Try storing the score in a variable then use it within the JOIN clause.

declare @score int = ( select top 1 score 
                       FROM leaderboard
                       WHERE id = some_user_id 
                     );

SELECT COUNT(*) + 1 as 'rank'
FROM leaderboard lb
JOIN users u
  ON lb.id = u.id 
 AND lb.score > @score
 AND u.country = 'United States';

edited Jan 20 '18 at 13:55

answered Jan 14 '18 at 22:32

paparazzo

44,497
23
105
176

After testing this approach, no performance improvement made at all – Yahfoufi Jan 20 '18 at 13:36
@Yahfoufi It was worth a try – paparazzo Jan 20 '18 at 13:38

Hadi · Accepted Answer · 2018-01-21T11:21:19.870

Update 1: Just remove computation from where clause and use Joins

Based on your comments, if my first suggestion didn't improve the performance then i think the only thing you can do is: first of all, make sure that you created all needed indexes and statistics and remove computation from the WHERE clause, because it is not necessary, and use JOINs instead of linking table in the where clause (Using Joins doesn't improve performance but it's syntax is clearer and less ambiguous)

SELECT COUNT(*) + 1
FROM leaderboard lb INNER JOIN users u
ON lb.id = u.id 
WHERE u.country = 'United States' 
AND lb.score  > (SELECT lb2.score 
                  FROM leaderboard lb2
                  WHERE lb2.id = some_user_id)

Note that lb.score + 1> (SELECT lb2.score FROM leaderboard lb2 WHERE lb2.id = some_user_id) is equivalent to lb.score >= (SELECT lb2.score FROM leaderboard lb2 WHERE lb2.id = some_user_id) if score is integer, and you don't need it.

Initial answer: Use one of the Ranking functions with a subquery or CTE

I think it is better to use a ranking function like RANK()

Subquery

SELECT * FROM (

    SELECT u.id AS user_id, u.country, lb.score, RANK() OVER (ORDER BY lb.score DESC) AS rnk
    FROM users u
    INNER JOIN leaderboard lb ON u.id = lb.id
    WHERE u.country = 'United States' 

) T1 WHERE T1.user_id = some_user_id

Common table expression

 WITH CTE_1 AS (

    SELECT u.id AS user_id, u.country, lb.score, RANK() OVER (ORDER BY lb.score DESC) AS rnk
    FROM users u
    INNER JOIN leaderboard lb ON u.id = lb.id
    WHERE u.country = 'United States' 

) SELECT * FROM CTE_1 WHERE CTE_1.user_id = some_user_id

References

I've tried both options, but they are much slower than the original. — SuperFrog, Jan 16 '18 at 17:02
This one works much better than the previous. It's now about the same as the original query, Maybe it's the best possible. — SuperFrog, Jan 16 '18 at 21:22
@UdiIdan i think that the only way you can optimize the query is to create the necessary indexes and statistics, if so then you are on the right way — Hadi, Jan 16 '18 at 21:24
I did. I also tried a totally different approach, which can be seen here: https://stackoverflow.com/questions/48287218/stored-procedure-slow-when-executed-from-application-server?noredirect=1#comment83564958_48287218 — SuperFrog, Jan 16 '18 at 21:26
I think that this is the only suitable answer (the answer update part), it enhance the performance a very little bit, but this is the only way, the query is already good — Yahfoufi, Jan 20 '18 at 15:44

TFD · Answer 4 · 2018-01-19T08:20:44.033

0

Make it more readable/testable and force execution steps using CTE's

WITH scoreToRank AS (
  SELECT score
  FROM leaderboard
  WHERE id = {some_user_id}
)

, usersInCountry AS (
  SELECT id
  FROM users
  WHERE country = 'United States'
)

, countOfUsersWithGreaterScore AS (
  SELECT COUNT(*) AS count
  FROM leaderboard l
  INNER JOIN usersInCountry u ON u.id = l.id
  WHERE l.score > (SELECT score FROM scoreToRank)
)

SELECT count + 1 AS usersRank FROM countOfUsersWithGreaterScore

Depending on SQL version and data density, using as IN clause for countOfUsersWithGreaterScore / usersInCountry may be more efficient

edited Jan 19 '18 at 08:20

answered Jan 14 '18 at 22:06

TFD

23,890
2
34
51

1

*and force execution steps using CTE*: This is wrong... CTEs are very readable, but they are not executed in the order you see. Go [to this connect issue](https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=218968) and vote for it. And [read this](https://social.msdn.microsoft.com/Forums/sqlserver/en-US/82c97fa4-2b63-4d0d-90c6-56943fcecd29/cte-execution-order?forum=sqldatabaseengine) – Shnugo Jan 18 '18 at 09:41
@Shnugo I don't normally use SQL Server, and this code would work on most modern SQL engines. I tested this on SQL Server 2016 and the execution plans followed the steps as per CTE layout. Maybe with simpler queries SQL Server tries to optimise them out, but in this case not so much – TFD Jan 19 '18 at 08:23
However - this is random... A CTE **does not influence the order of execution in any way** (regretfully!). It's the engine chosing the - hopefully - best way. That's why some processes run much faster if you push the CTE's result into a temp table instead. But this cannot work in `VIEWs` or `iTVFs`. Please vote the connect issue and let's hope for a future version. – Shnugo Jan 19 '18 at 08:34

score 0 · Answer 5 · answered Jan 14 '18 at 22:34

0

Can you please try this? It looks a little odd but I think it might work:

SELECT COUNT(*) + 1
FROM leaderboard lb, users u, leaderboard lb2
WHERE u.country = 'United States' 
  AND lb.id = u.id 
  AND lb.score + 1 > lb2.score AND lb2.id = some_user_id

answered Jan 14 '18 at 22:34

Ctznkane525

7,297
3
16
40

Oceans · Answer 6 · 2018-01-17T12:03:06.487

I prefer joins over subqueries, the query below should give you the same result from your question.

SELECT COUNT(*) + 1
FROM leaderboard lb2
    LEFT OUTER JOIN users u ON u.Id <> lb2.Id AND u.country = 'United States'
    LEFT OUTER JOIN leaderboard lb ON lb.Id = u.Id
WHERE lb2.Id = some_user_id AND lb.score >= lb2.score

If some users don't have scores than you should check for null values or you could change the joining order, which might even be better in some cases:

SELECT COUNT(*) + 1
FROM leaderboard lb2
    LEFT OUTER JOIN leaderboard lb ON lb.Id <> lb2.Id AND lb.score >= lb2.score 
    LEFT OUTER JOIN users u ON u.Id = lb.Id
WHERE lb2.Id = some_user_id AND u.country = 'United States'

Gene · Answer 7 · 2018-01-20T01:21:27.933

This doesn't address your query, but rather the overall problem in case that's useful.

I confronted the same for an international contest where the teams table could get quite big. I never was able to get an SQL ranking query to perform well enough for a good user experience (goal was 80ms, and the query was somewhat more complex than yours), so finally decided to use a redis server just for returning ranks.

It provides a ranking function exactly suited to this problem. It's fast: a few millis for a table of 10 million contestants.

I still regarded scores stored in the SQL DB as the source of truth. Redis isn't ACID. It saves to disk only snapshots of its data image in RAM. If the server goes down, it reverts to the last snapshot. So redis and the source of truth could diverge a bit.

This wasn't any problem in my case because instantly returned ranks were acknowledged to be unofficial pending final review by judges. Missing data due to reboots from snapshot were "self healing". That is, if I queried for the rank of a team and it wasn't in the redis store, I added it then re-queried. I also ran a daily sync job to restore perfect agreement. I could run this sync at any time to initialize a new redis from scratch.

This scheme proved extremely fast and robust for 7 years. The implementation it replaced used a custom BerkeleyDB-based service. That one worked well for the preceding 7 years.

One other point is that a redis service can be very handy for other purposes like caches.

score 0 · Answer 8 · answered Jan 20 '18 at 22:46

0

Maybe try denormalizing? For each row in the leaderboard table, include the user's country.

Also, use a CountryID instead of the country name, since ints are faster to query than varchar. (You can look up the name for the country separately.)

Then you could get the count you're seeking without needing a join or subselect - it would just be a select on a single table (and a faster one, since you would be using an int).

answered Jan 20 '18 at 22:46

Taraz

1,242
13
13

Note: For country a tinyint should be big enough. – Taraz Jan 20 '18 at 22:50

score -1 · Answer 9 · edited Jan 16 '18 at 16:36

-1

try something like :

SELECT score FROM leaderboard WHERE id in
    SELECT id FROM users WHERE country='United States' and id=some_user_id

edited Jan 16 '18 at 16:36

Leonardo Alves Machado

2,747
10
38
53

answered Jan 16 '18 at 15:27

FayFay

1
1
1

SQL Server rank query optimization

9 Answers9

Update 1: Just remove computation from where clause and use Joins

Initial answer: Use one of the Ranking functions with a subquery or CTE