I have read many posts on this topic, such as mysql-get-rank-from-leaderboards.
However, none of the solutions are efficient at scale for getting a range of ranks from the database.
The problem is simple. Suppose we have a Postgres table with an "id" column and another INTEGER column whose values are not unique, but we have an index for this column.
e.g. table could be:
CREATE TABLE my_game_users (id serial PRIMARY KEY, rating INTEGER NOT NULL);
The goal
- Define a rank for users ordering users on the "rating" column descending
- Be able to query for a list of ~50 users ordered by this new "rank", centered at any particular user
- For example, we might return users with ranks { 15, 16, ..., 64, 65 } where the center user has rank #40
- Performance must scale, e.g. be under 80 ms for 100,000 users.
Attempt #1: row_number() window function
WITH my_ranks AS
(SELECT my_game_users.*, row_number() OVER (ORDER BY rating DESC) AS rank
FROM my_game_users)
SELECT *
FROM my_ranks
WHERE rank >= 4000 AND rank <= 4050
ORDER BY rank ASC;
This "works", but the queries average 550ms with 100,000 users on a fast laptop without any other real work being done.
I tried adding indexes, and re-phrasing this query to not use the "WITH" syntax, and nothing worked to speed it up.
Attempt #2 - count the number of rows with a greater rating value I tried a query like this:
SELECT t1.*,
(SELECT COUNT(*)
FROM my_game_users t2
WHERE (t1.rating, -t1.id) <= (t2.rating, -t2.id)
) AS rank
FROM my_game_users t1
WHERE id = 2000;
This is decent, this query takes about 120ms with 100,000 users having random ratings. However, this only returns the rank for user with a particular id (2000).
I can't see any efficient way to extend this query to get a range of ranks. Any attempt at extending this makes a very slow query.
I only know the ID of the "center" user, since the users have to be ordered by rank before we know which ones are in the range!
Attempt #3: in-memory ordered Tree
I ended up using a Java TreeSet to store the ranks. I can update the TreeSet whenever a new user is inserted into the database, or a user's rating changes.
This is super fast, around 25 ms with 100,000 users.
However, it has a serious drawback that it's only updated on the Webapp node that serviced the request. I'm using Heroku and will deploy multiple nodes for my app. So, I needed to add a scheduled task for the server to re-build this ranking tree every hour, to make sure the nodes don't get too out-of-sync!
If anyone knows of an efficient way to do this in Postgres with full solution, then I am all ears!