Picking the top n rows from a randomly sorted table is a simple, but slow way to pick 50 rows randomly. All rows have to be sorted that way.
Doesn't matter much for small to medium tables and one-time, ad-hoc use. For repeated use on a big table, there are much more efficient ways.
If the ratio of gaps / island in the primary key is low, use this:
SELECT row_number() OVER() AS rn, *
FROM (
SELECT *
FROM (
SELECT trunc(random() * 999999)::int AS foo_id
FROM generate_series(1, 55) g
GROUP BY 1 -- fold duplicates
) sub1
JOIN foo USING (foo_id)
LIMIT 50
) sub2;
With an index on foo_id
, this blazingly fast, no matter how big the table. (A primary key serves just fine.) Compare performance with EXPLAIN ANALYZE
.
How?
999999
is an estimated row count of the table, rounded up. You can get it cheaply from:
SELECT reltuples FROM pg_class WHERE oid = 'foo'::regclass;
Round up to easily include possible new entries since the last ANALYZE
. You can also use the expression itself in a generic query dynamically, it's cheap. Details:
55
is your desired number of rows (50
) in the result, multiplied by a low factor to easily make up for the gap ratio in your table and (unlikely but possible) duplicate random numbers.
If your primary key does not start near 1 (does not have to be 1 exactly, gaps are covered), add the minimum pk value to the calculation:
min_pkey + trunc(random() * 999999)::int
Detailed explanation here: