How to efficiently get a range of ranked users (for a leaderboard) using Postgresql

Question

I have read many posts on this topic, such as mysql-get-rank-from-leaderboards.

However, none of the solutions are efficient at scale for getting a range of ranks from the database.

The problem is simple. Suppose we have a Postgres table with an "id" column and another INTEGER column whose values are not unique, but we have an index for this column.

e.g. table could be:

CREATE TABLE my_game_users (id serial PRIMARY KEY, rating INTEGER NOT NULL);

The goal

Define a rank for users ordering users on the "rating" column descending
Be able to query for a list of ~50 users ordered by this new "rank", centered at any particular user
For example, we might return users with ranks { 15, 16, ..., 64, 65 } where the center user has rank #40
Performance must scale, e.g. be under 80 ms for 100,000 users.

Attempt #1: row_number() window function

WITH my_ranks AS 
  (SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank
   FROM my_game_users)
SELECT *
FROM my_ranks
WHERE rank >= 4000 AND rank <= 4050
ORDER BY rank ASC;

This "works", but the queries average 550ms with 100,000 users on a fast laptop without any other real work being done.

I tried adding indexes, and re-phrasing this query to not use the "WITH" syntax, and nothing worked to speed it up.

Attempt #2 - count the number of rows with a greater rating value I tried a query like this:

SELECT  t1.*,
  (SELECT  COUNT(*)
   FROM my_game_users t2
   WHERE (t1.rating, -t1.id) <= (t2.rating, -t2.id)
  ) AS rank
FROM my_game_users t1
WHERE id = 2000;

This is decent, this query takes about 120ms with 100,000 users having random ratings. However, this only returns the rank for user with a particular id (2000).

I can't see any efficient way to extend this query to get a range of ranks. Any attempt at extending this makes a very slow query.

I only know the ID of the "center" user, since the users have to be ordered by rank before we know which ones are in the range!

Attempt #3: in-memory ordered Tree

I ended up using a Java TreeSet to store the ranks. I can update the TreeSet whenever a new user is inserted into the database, or a user's rating changes.

This is super fast, around 25 ms with 100,000 users.

However, it has a serious drawback that it's only updated on the Webapp node that serviced the request. I'm using Heroku and will deploy multiple nodes for my app. So, I needed to add a scheduled task for the server to re-build this ranking tree every hour, to make sure the nodes don't get too out-of-sync!

If anyone knows of an efficient way to do this in Postgres with full solution, then I am all ears!

Depending on how your updates work, perhaps consider a materialized view? — chrylis -cautiouslyoptimistic-, Aug 02 '15 at 00:50
In your first attempt why you order the rank `desc` in the over function then you reorder again as asc? I think if you use only one order by (in your case the one on the over function) maybe it can be faster. — Jorge Campos, Aug 02 '15 at 00:52
[Pagination Done the Right Way](http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way) — Vladimir Baranov, Aug 04 '15 at 02:29
As far as Node JS goes, for performance-critical requests, consider using https://github.com/brianc/node-pg-query-stream — vitaly-t, Aug 04 '15 at 20:56
Are you using redis by any chance? If you are redis zorted sets are tailor made for this purpose and very fast and scales. — e4c5, Aug 16 '15 at 02:06

FuzzyTree · Answer 1 · 2015-08-04T02:27:17.747

You can get the same results by using order by rating desc and offset and limit to get users between a certain rank.

WITH my_ranks AS 
    (SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank FROM my_game_users)
SELECT * FROM my_ranks WHERE rank >= 4000 AND rank <= 4050 ORDER BY rank ASC;

The query above is the same as

select * , rank() over (order by rating desc) rank 
from my_game_users 
order by rating desc
limit 50 offset 4000

If you want to select users around rank #40 you could select ranks #15-#65

select *, rank() over (order by rating desc) rank 
from my_game_users 
order by rating desc
limit 50 offset 15

score 1 · Answer 2 · answered Aug 04 '15 at 01:51

Thanks, @FuzzyTree ! Your solution doesn't quite give me everything I need, but it nudged me in the right direction. Here's the full solution I'm going with for now.

The only limitation with your solution is that there's no way to get a unique rank for a particular user. All users with the same rating would have the same rank (or at least it is undefined by SQL standard). If I knew the OFFSET ahead of time, then your rank would be good enough, but I have to get the rank of a particular user first.

My solution is to do the following query to get a range of ranks:

SELECT * FROM my_game_users ORDER BY rating DESC, id ASC LIMIT ? OFFSET ?

This is basically uniquely defining the ranks by rating, then by who joined the Game first (lower id). To make this efficient I'm creating an index on (rating DESC, id)

Then, I'm getting a particular user's rank to plug in to this query with:

SELECT COUNT(*) FROM my_game_users WHERE rating > ? OR (rating = ? AND id < ?)

I actually made this more efficient with:

SELECT (SELECT COUNT(*) FROM my_game_users WHERE rating > ?) + (SELECT COUNT(*) FROM my_game_users WHERE rating = ? AND id < ?) + 1

Now, even with these queries it takes about 78ms average and median time to get the ranks around a user. If anyone has a good idea how to speed these up I'm all ears!

For example, getting a range of ranks takes about 60ms, and explaining it yields:

EXPLAIN SELECT * FROM word_users ORDER BY rating DESC, id ASC LIMIT 50 OFFSET 50000;

"Limit (cost=6350.28..6356.63 rows=50 width=665)" " -> Index Scan using idx_rating_desc_and_id on word_users (cost=0.29..12704.83 rows=100036 width=665)"

So, it's using the rating and id index, yet it still has this highly variable cost from 0.29...12704.83. Any ideas how to improve??

See my update you can interchange rank() with row_number() depending on what you need. It should be relatively fast — FuzzyTree, Aug 04 '15 at 02:29

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

0

If you order it in desc order you have it in the right order. Use the rownumber() function. Select Row number in postgres

Also you would use an in memory cache to store stuff in memory. Something like redis. Its a separate application that can serve multiple instances, even remotely.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 02 '15 at 00:35

jollarvia

349
4
9

You sir, did not read the full post. I already tried row_number(), but it is far too slow with 100,000 users! – Charles Capps Aug 04 '15 at 01:43

How to efficiently get a range of ranked users (for a leaderboard) using Postgresql

3 Answers3