66

In MySQL, you can select X random rows with the following statement:

SELECT * FROM table ORDER BY RAND() LIMIT X

This does not, however, work in SQLite. Is there an equivalent?

Fahad Sadah
  • 2,368
  • 3
  • 18
  • 27

6 Answers6

101

For a much better performance use:

SELECT * FROM table WHERE id IN (SELECT id FROM table ORDER BY RANDOM() LIMIT x)

SQL engines first load projected fields of rows to memory then sort them, here we just do a random sort on id field of each row which is in memory because it's indexed, then separate X of them, and find the whole row using these X ids.

So this consume less RAM and CPU as table grows!

Ali
  • 21,572
  • 15
  • 83
  • 95
  • 1
    Method 2 was _significantly_ faster than the accepted answer. Thanks! – Alex Guerra Aug 03 '16 at 16:08
  • 1
    This is brilliant thanks, I've updated the accepted answer :) – Fahad Sadah Jan 10 '17 at 17:17
  • How random is `ORDER BY RANDOM()`? Because... I do not feel like it is really random... There are seriously a group of rows very frequently chosen... Does not anyone have the same feeling? – ghchoi Dec 02 '17 at 15:42
  • Should I always use limit 1 or can I just use fethone using sqlite3 python? – Marat Mkhitaryan Feb 24 '19 at 15:31
  • how would you, or can you, put this into the query() method? – Peter G. Williams Apr 28 '19 at 20:27
  • @Amboseli Have you tried passing "id IN (SELECT id FROM table ORDER BY RANDOM() LIMIT ?)" as the `selection` parameter, and passing `new String[] { String.valueOf(x) }` as the `selectionArgs`? – LarsH Sep 06 '19 at 20:42
  • 6
    For anyone wondering what Alex's comment refers to, the answer used to show 2 methods, and the one that's currently shown is method 2. I'm guessing that another answer (Donnie's) used to be the accepted answer. – LarsH Sep 06 '19 at 20:44
62
SELECT * FROM table ORDER BY RANDOM() LIMIT X
vvvvv
  • 25,404
  • 19
  • 49
  • 81
Donnie
  • 45,732
  • 10
  • 64
  • 86
  • 4
    For the record: this is working, but is slow on larger tables. A faster way (although not quite the same) would be: `SELECT * FROM table WHERE random() % k = 0 LIMIT n;`. Drawback of this is that records with lower primary keys get higher chance of being selected. [Taken from here](http://blog.rodolfocarvalho.net/2012/05/how-to-select-random-rows-from-sqlite.html) – ren Mar 09 '14 at 16:08
  • 2
    Yes, it will be slow on big tables as it ends up forcing a table scan. Downside of wanting to do stuff like this in SQL. Best way would be to pick random offsets in your front end. – Donnie May 12 '14 at 18:14
10
SELECT * FROM table ORDER BY RANDOM() LIMIT 1
Gareth Davis
  • 27,701
  • 12
  • 73
  • 106
bkaid
  • 51,465
  • 22
  • 112
  • 128
4

The accepted answer works, but requires a full table scan per query. This will get slower and slower as your table grows large, making it risky for queries that are triggered by end-users.

The following solution takes advantage of indexes to run in O(log(N)) time.

SELECT * FROM table
WHERE rowid > (
  ABS(RANDOM()) % (SELECT max(rowid) FROM table)
)
LIMIT 1;

To break it down

  • SELECT max(rowid) FROM table - Returns the largest valid rowid for the table. SQLite is able to use the index on rowid to run this efficiently.
  • ABS(RANDOM()) % ... - Return a random number between 0 and max(rowid) - 1). SQLite's random function generates a number between -9223372036854775808 and +9223372036854775807. The ABS makes sure its positive, and the modulus operator gates it between max(rowid) - 1.
  • rowid > ... - Rather than using =, use > in case the random number generated corresponds to a deleted row. Using strictly greater than ensures that we return a row with a row id between 1 (greater than 0) and max(rowid) (great than max(rowid) - 1). SQLite uses the primary key index to efficiently return this result as well.

This also works for queries with WHERE clauses. Apply the WHERE clause to both the output and the SELECT max(rowid) subquery. I'm not sure which conditions this will run efficiently, however.

Note: This was derived from an answer in a similar question.

Max Shenfield
  • 3,927
  • 2
  • 13
  • 16
  • Been using this approach and it works well (fast). – Vass Apr 16 '22 at 21:11
  • 1
    This solution is awesome up to the point where you need to do a mass delete, after which half of your results will be the first row after the "hole" in the ROWIDs. Better randomness results from either maintaining an order column without holes, or using some variation on "LIMIT rand()" although not sure how to do that with sqlite (if I figure out maybe I'll post an answer :). – tekHedd Sep 13 '22 at 18:35
3

All answers here are based on ORDER BY. This is very inefficient (i.e. unusable) for large sets because you will evaluate RANDOM() for each record, and then ORDER BY which is a resource expensive operation.

An other approach is to place abs(CAST(random() AS REAL))/9223372036854775808 < 0.5 in the WHERE clause to get in this case for example 0.5 hit chance.

SELECT *
FROM table
WHERE abs(CAST(random() AS REAL))/9223372036854775808 < 0.5

The large number is the maximum absolute number that random() can produce. The abs() is because it is signed. Result is a uniformly distributed random variable between 0 and 1.

This has its drawbacks. You can not guarantee a result and if the threshold is large compared to the table, the selected data will be skewed towards the start of the table. But in some carefully designed situations, it can be a feasible option.

Davor Josipovic
  • 5,296
  • 1
  • 39
  • 57
  • If you add an `ORDER BY RANDOM()`, does that occur before or after your optimization? If after, then that fixes the skew, and also makes this scale for N>1 selected. – Mooing Duck Oct 22 '22 at 07:32
  • 1
    RANDOM would be executed on the final result, after WHERE and before ORDER. – Davor Josipovic Oct 23 '22 at 08:22
3

This one solves the negative RANDOM integers, and keeps good performance on large datasets:

SELECT * FROM table LIMIT 1 OFFSET abs(random() % (select count(*) from table));

where:
abs(random() % n ) Gives you a positive integer in range(0,n)

Evhz
  • 8,852
  • 9
  • 51
  • 69
  • I was wondering how this window offset solution would compare to others, in terms of performance. I wish an answer had an analysis. – Mooing Duck Oct 22 '22 at 07:16
  • I also just realized I want to select N random rows and this method doesn't extend past 1 row cleanly. I bet its still the best answer for 1 row. – Mooing Duck Oct 22 '22 at 07:29