1

I have a high score table in SQLITE that supports three simple operations:

  1. Adding a new user along with their high score.
  2. Updating an existing users high score.
  3. Getting the leaderboard (all users and scores in order of score):
    SELECT * FROM scores ORDER BY high_score DESC

Everything works, but I'm worried how this scales: with 10,000 users sorting the high scores takes ~60ms which is OK, but this time goes up roughly linearly such that if I had 100,000 users requesting the high scores would take ~600ms which is far too slow.

Is there a smart way to insert new users/update their scores in order to avoid having to do a complete sort whenever I retrieve the leaderboard? E.g. Something like a C++ priority_queue or a python heapq.

I suppose I could sort and replace the entire database on every insert (e.g. Sort an entire SQLite table) but that seems like overkill.

Community
  • 1
  • 1
dgmp88
  • 537
  • 6
  • 13

3 Answers3

5

If you are concerned about order by performance in a select, then create an index. For a simple query such as:

select s.*
from scores s
order by s.high_score desc;

You want an index on scores(high_score):

create index idx_scores_highscore on scores(high_score desc);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Gives a massive speedup, thanks! Up to hundreds of times, depending on the size of the database. Gist with demo in python [here](https://gist.github.com/dgmp88/30ecfaf5a901f94f902d) in case it's useful for anyone. – dgmp88 Jan 03 '16 at 21:19
1

Sorting/ordering is done on a SELECT, not on an INSERT. Relational databases don't guarantee sort order when it isn't explicitly defined by the query.

In general it's likely that you'll get records back in the same order in which they were inserted when querying any given table, particularly if the primary key is a simple auto-incrementing number since the DB will likely use the PK as the sort (if it's the clustered index). But it's not guaranteed. And as soon as you start joining other tables all bets are off, because the DB has a lot more logic to optimize that query.

Basically, if you want to get back records in a specific order, you have to specify that order in your SELECT.

If performance is a problem, profile your queries and look for the performance problems. Is the sort column indexed? Is a full table scan being performed somewhere? Something else going wrong? Determine the execution plan for the query and address specific bottlenecks.

David
  • 208,112
  • 36
  • 198
  • 279
  • this is a great explanation, I like how you mention the autoincrement, which in SQLite happens to an implicit column called `ROWID` that is always incremented upon insert unless it is explicitly turned off. Still, I've seen people rely heavily on `ROWID` order when no joins are present. Why does `ROWID` seem to guarantee consistent order without any `JOIN` present and might it improve performance by skipping `ORDER BY` can you elaborate? This helps understand why this anti-pattern gets exploited at times. Thank you! – ecoe Jun 27 '22 at 14:55
1

As others have said, ordering needs to be done on the 'select' part.

Here are a couple of performance tipps:

Only query the data you really need and care for. Don't select all columns if you don't need them. If you are only interested in the "top 20" or so, limit your results.

Cache the output. This means: store the result of your 'select' and invalidate your cache on changes to the table. You might be able to achieve this with triggers automatically, but doing it "manually" when fiddling with the data should be easy as well.

Eiko
  • 25,601
  • 15
  • 56
  • 71