We are building a massive multi-player educational game with some millions of entries in the leader-board (based on aggregated XPs gained). After a game finishes, we need to show the leaderboard and how this player/student is ranked.
But there are a couple of filters for this leaderboard (global/by country, by month/year/today, by age etc) that can be mixed together e.g. 'Get me the leaderboard for my Country
for the last month
'. Number of combinations is ~20.
My problem is how to store such a structure that is updated regularly; recalculation of rankings must be done after each game. A typical full leaderboard at the moment has ~5 millions of entries for players coming from >150 countries.
I used to have a MySQL Cluster Table (userid, xps, countryid) with 3 nodes, but ordering by XPs (either in DBMS or application which required all data from DB) proven to be too slow as numbers got bigger (>20K of users). This is an interesting post but again half a second for each query is too much.
Then we used REDIS (see this post), but filtering is the problem here. We used separate lists for TOP 5 and the rest. TOP 5 was updated instantly, for the rest there was some delay of 20-30 minutes. We in fact ranked this user based on a cached instance of the Leaderboard (using the real XPs though, not the cached), so this was acceptable. Real-time on non-Top5 is not a prerequisite. This is fine for one global ranking, but how to filter the results based on month and/or country and/or age. Do we need to keep a list for every filtering combination?
We also tested custom structures in Java (using it as a Java caching server similar in functionality with REDIS), still experimenting with it. Which is the best combination of structures to achieve our goal? We ended up using one list per filtering combination e.g.
Map<FilteringCombination, SortedList<User>>
and then doing binary search to the list of a specific key. This way, a finished game requires a couple of insertions say X, but it requires X*NumOfPlayers space, which is X times more than keeping a single list (not sure if this can fit to memory but we can always create a cluster here by splitting combinations to different servers). There is an issue here on how to rebuild the cache in case of failure, but that is another problem we can deal with.Extending the above method, we might slightly improve performance if we define scoring buckets inside each list (eg a bucket for 0-100xp, another for 101 - 1000xp, another for 1001 - 10000xp etc). The bucket splitting policy will be based on the players' xp distribution in our game. It's true that this distribution is dynamic in real world, but we have seen that after a few months changes are minor, having in mind that XPs are always increasing but new users are coming as well.
We are also testing Cassandra's natural ordering by utilizing clustering keys and white-rows feature, although we know that having some millions of rows may not be easy to handle.
All in all, that is what we need to achieve. If a user (let's name her UserX) is not included in the Top5 list, we need to show this user's ranking together with some surrounding players (eg 2 above and 2 below) as the example below:
Global TOP 5 My Global Ranking (425) My Country Ranking Other Rankings
1. karen (12000xp) 423. george 1. david
2. greg (11280xp) 424. nancy 2. donald
3. philips (10293xp) **425. UserX** 3. susan
4. jason (9800xp) 426. rebecca **4. UserX**
5. barbara (8000xp) 427. james 5. teresa
I've studied many SO or other posts, but still cannot find a solution for efficiently updating and filtering large Leaderboard tables. Which one candidate solution would you choose and what are the possible performance improvements (space + memory + (Insertion/Searching CPU cost))?