Select random row from a sqlite table

Question

I have a sqlite table with the following schema:

CREATE TABLE foo (bar VARCHAR)

I'm using this table as storage for a list of strings.

How do I select a random row from this table?

multiple http://stackoverflow.com/questions/4114940/select-random-rows-in-sqlite — Ciro Santilli OurBigBook.com, Dec 06 '15 at 09:23

score 304 · Answer 1 · edited Jan 08 '21 at 02:49

304

Have a look at Selecting a Random Row from an SQLite Table

SELECT * FROM table ORDER BY RANDOM() LIMIT 1;

edited Jan 08 '21 at 02:49

Community

1
1

answered Feb 17 '10 at 10:06

Adriaan Stander

162,879
31
289
284

1

How to extend this solution to a join? When using `SELECT a.foo FROM a JOIN b ON a.id = b.id WHERE b.bar = 2 ORDER BY RANDOM() LIMIT 1;` I always get the same row. – Helmut Grohne Sep 19 '13 at 08:18
Is it possible to seed the random number. e.g. Book of the day seeded with unix epoc for today at noon so it shows the same book all day even if the query is run multiple times. Yes I know caching is more efficient for this use case just an example. – danielson317 Apr 22 '20 at 17:18
1

FWIW my question is actually answered here. And the answer is you can not seed the random number. https://stackoverflow.com/questions/24256258/order-by-random-with-seed-in-sqlite – danielson317 Apr 22 '20 at 17:21

Suzanne Soy · Answer 2 · 2016-03-25T18:41:51.067

The following solutions are much faster than anktastic's (the count(*) costs a lot, but if you can cache it, then the difference shouldn't be that big), which itself is much faster than the "order by random()" when you have a large number of rows, although they have a few inconvenients.

If your rowids are rather packed (ie. few deletions), then you can do the following (using (select max(rowid) from foo)+1 instead of max(rowid)+1 gives better performance, as explained in the comments):

select * from foo where rowid = (abs(random()) % (select (select max(rowid) from foo)+1));

If you have holes, you will sometimes try to select a non-existant rowid, and the select will return an empty result set. If this is not acceptable, you can provide a default value like this :

select * from foo where rowid = (abs(random()) % (select (select max(rowid) from foo)+1)) or rowid = (select max(rowid) from node) order by rowid limit 1;

This second solution isn't perfect : the distribution of probability is higher on the last row (the one with the highest rowid), but if you often add stuff to the table, it will become a moving target and the distribution of probabilities should be much better.

Yet another solution, if you often select random stuff from a table with lots of holes, then you might want to create a table that contains the rows of the original table sorted in random order :

create table random_foo(foo_id);

Then, periodicalliy, re-fill the table random_foo

delete from random_foo;
insert into random_foo select id from foo;

And to select a random row, you can use my first method (there are no holes here). Of course, this last method has some concurrency problems, but the re-building of random_foo is a maintainance operation that's not likely to happen very often.

Yet, yet another way, that I recently found on a mailing list, is to put a trigger on delete to move the row with the biggest rowid into the current deleted row, so that no holes are left.

Lastly, note that the behavior of rowid and an integer primary key autoincrement is not identical (with rowid, when a new row is inserted, max(rowid)+1 is chosen, wheras it is higest-value-ever-seen+1 for a primary key), so the last solution won't work with an autoincrement in random_foo, but the other methods will.

Like I just saw on a mailing list, instead of having the fallback method (method 2), you can just use rowid >= [random] instead of =, but it is actually slugissingly slow compared to method 2. — Suzanne Soy, Jan 19 '11 at 20:51
This is a great answer; however it has one problem. `SELECT max(rowid) + 1` will be a slow query -- it requires a full table scan. sqlite only optimizes the query `SELECT max(rowid)`. Thus, this answer would be improved by: `select * from foo where rowid = (abs(random()) % (select (select max(rowid) from foo)+1));` See this for more info: http://sqlite.1065341.n5.nabble.com/performance-question-SELECT-max-rowid-1-td24311.html — dasl, Mar 24 '16 at 16:36
This is a good answer. You can fix the randomness of the distribution by replacing % with ABS(RANDOM() / 9223372036854775808 * ) but that's not very portable. — tekHedd, Oct 17 '20 at 22:59
Thanks for this response - still helpful years later. `rowid >= [random]` performs just as well as the fallback method in recent version of SQLite. I confirmed with a local benchmark (SQLite 3.34.1) running 250k queries for each version `EXPLAIN` also confirms that the execution plan is efficient. In addition, according to the [SQLite query optimizer docs](https://sqlite.org/optoverview.html#the_min_max_optimization), SQLite now optimizes `SELECT max(row) + 1` too. — Max Shenfield, Feb 07 '21 at 05:53
As much as I hate ORDER BY RAND() I just ran a bulk delete of duplicates, and now have a large ROWID gap in my database. Guess what this did to my randomness? Prefer the solution below using count(). — tekHedd, Sep 13 '22 at 18:45

Roberto Góes · Answer 3 · 2017-06-16T16:27:40.660

You need put "order by RANDOM()" on your query.

Example:

select * from quest order by RANDOM();

Let's see an complete example

Create a table:

CREATE TABLE  quest  (
    id  INTEGER PRIMARY KEY AUTOINCREMENT,
    quest TEXT NOT NULL,
    resp_id INTEGER NOT NULL
);

Inserting some values:

insert into quest(quest, resp_id) values ('1024/4',6), ('256/2',12), ('128/1',24);

A default select:

select * from quest;

| id |   quest  | resp_id |
   1     1024/4       6
   2     256/2       12
   3     128/1       24
--

A select random:

select * from quest order by RANDOM();
| id |   quest  | resp_id |
   3     128/1       24
   1     1024/4       6
   2     256/2       12
--

*Each time you select, the order will be different.

If you want to return only one row

select * from quest order by RANDOM() LIMIT 1;
| id |   quest  | resp_id |
   2     256/2       12
--

*Each time you select, the return will be different.

I prefer this solution, since it allows me to search for n lines. In my case, I needed 100 random samples from the database - ORDER BY RANDOM() combined with LIMIT 100 does exactly that. — mnr, Aug 29 '18 at 00:03

score 21 · Answer 4 · answered Nov 06 '10 at 02:00

21

What about:

SELECT COUNT(*) AS n FROM foo;

then choose a random number m in [0, n) and

SELECT * FROM foo LIMIT 1 OFFSET m;

You can even save the first number (n) somewhere and only update it when the database count changes. That way you don't have to do the SELECT COUNT every time.

answered Nov 06 '10 at 02:00

Andres Kievsky

3,461
32
25

1

That's a nice fast method. It doesn't generalize very well to selecting more than 1 row, but the OP only asked for 1, so I guess that's fine. – Ken Williams Dec 27 '12 at 17:04
1

A curious thing to note is that the time required to find the `OFFSET` seems to go up depending on the size of the offset - row 2 is fast, row 2 million takes a while, even when all the data in the is fixed-size and it should be able to seek directly to it. At least, that's what it looks like in SQLite 3.7.13. – Ken Williams Dec 27 '12 at 17:06
1

@KenWilliams Pretty much all databases have the same problem with `OFFSET``. It is a very inefficient way to query a database because it needs to read that many rows even though it will only return 1. – Jonathan Allen Apr 14 '16 at 10:26
1

Note that I was talking about /fixed size/ records though - it should be easy to scan directly to the correct byte in the data (*not* reading that many rows), but they'd have to implement the optimization explicitly. – Ken Williams Apr 15 '16 at 02:34
@KenWilliams: there aren't fixed sized records in SQLite, it is dynamically typed and the data doesn't have to match the declared affinities (https://www.sqlite.org/fileformat2.html#section_2_1). Everything is stored in b-tree pages, so either way it has to do at least a b-tree search towards the leaf. To accomplish this efficiently it would need to store the size of the subtree along with each child pointer. It would be too much of an overhead for little benefit, as you still won't be able to optimize the OFFSET for joins, order by, etc... (and without ORDER BY the order is undefined.) – Yakov Galka Jun 13 '16 at 14:57

score 15 · Answer 5 · edited Jul 04 '17 at 20:19

Here is a modification of @ank's solution:

SELECT * 
FROM table
LIMIT 1 
OFFSET ABS(RANDOM()) % MAX((SELECT COUNT(*) FROM table), 1)

This solution also works for indices with gaps, because we randomize an offset in a range [0, count). MAX is used to handle a case with empty table.

Here are simple test results on a table with 16k rows:

sqlite> .timer on
sqlite> select count(*) from payment;
16049
Run Time: real 0.000 user 0.000140 sys 0.000117

sqlite> select payment_id from payment limit 1 offset abs(random()) % (select count(*) from payment);
14746
Run Time: real 0.002 user 0.000899 sys 0.000132
sqlite> select payment_id from payment limit 1 offset abs(random()) % (select count(*) from payment);
12486
Run Time: real 0.001 user 0.000952 sys 0.000103

sqlite> select payment_id from payment order by random() limit 1;
3134
Run Time: real 0.015 user 0.014022 sys 0.000309
sqlite> select payment_id from payment order by random() limit 1;
9407
Run Time: real 0.018 user 0.013757 sys 0.000208

This is glorious, but I do have to wonder if there's a workaround for the "OFFSET is inefficient" problem. The only other reasonable alternative seems to be to maintain a column in the table explicitly for randomizing... — tekHedd, Sep 13 '22 at 18:48

score 13 · Answer 6 · answered Feb 17 '10 at 10:07

13

SELECT   bar
FROM     foo
ORDER BY Random()
LIMIT    1

answered Feb 17 '10 at 10:07

Svetlozar Angelov

21,214
6
62
67

12

Since it will select the whole table content first, wouldn't this be very time-consuming for large tables? – Alex_coder Feb 17 '10 at 10:20
1

Can't you just limit the scope using "WHERE" condition(s)? – jldupont Aug 19 '10 at 01:16

score 4 · Answer 7 · edited Jun 20 '20 at 09:12

4

I came up with the following solution for the large sqlite3 databases:

SELECT * FROM foo WHERE rowid = abs(random()) % (SELECT max(rowid) FROM foo) + 1;

The abs(X) function returns the absolute value of the numeric argument X.

The random() function returns a pseudo-random integer between -9223372036854775808 and +9223372036854775807.

The operator % outputs the integer value of its left operand modulo its right operand.

Finally, you add +1 to prevent rowid equal to 0.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 21 '17 at 01:58

Max

1,685
16
21

7

Good try but I don't think this will work. What if a row with rowId = 5 was deleted, but rowIds 1,2,3,4,6,7,8,9,10 still exist? Then, if the random rowId chosen is 5, this query will return nothing. – Calicoder Aug 14 '19 at 17:43

Select random row from a sqlite table

7 Answers7

Linked

Related