Joining a series in postgres with a select query

Question

I'm looking for a way to join these two queries (or run these two together):

SELECT  s
FROM    generate_series(1, 50) s;

With this query:

SELECT id FROM foo ORDER BY RANDOM() LIMIT 50;

In a way where I get 50 rows like this:

series, ids_from_foo
1, 53
2, 34
3, 23

I've been at it for a couple days now and I can't figure it out. Any help would be great.

score 2 · Accepted Answer · answered Aug 29 '14 at 15:10

2

Use row_number()

select row_number() over() as rn, a
from (
    select a
    from foo
    order by random()
    limit 50
) s
order by rn;

answered Aug 29 '14 at 15:10

Clodoaldo Neto

118,695
26
233
260

Thanks for the help. I was hoping for something that would preserve the order of series yet keep the ids from foo random and distinct. – newUserNameHere Aug 29 '14 at 15:14
1

@newUserNameHere Does the new version do what you want? – Clodoaldo Neto Aug 29 '14 at 15:15
Yes it does. My hat is off to you. You were really fast so in 2 minutes I'll be able to accept your answer. Thank you! – newUserNameHere Aug 29 '14 at 15:17

Erwin Brandstetter · Answer 2 · 2022-06-05T00:55:29.900

Picking the top n rows from a randomly sorted table is a simple, but slow way to pick 50 rows randomly. All rows have to be sorted that way.

Doesn't matter much for small to medium tables and one-time, ad-hoc use. For repeated use on a big table, there are much more efficient ways. If the ratio of gaps / island in the primary key is low, use this:

SELECT row_number() OVER() AS rn, *
FROM  (
   SELECT *
   FROM  (
       SELECT trunc(random() * 999999)::int AS foo_id
       FROM   generate_series(1, 55) g
       GROUP  BY 1                     -- fold duplicates
       ) sub1
   JOIN   foo USING (foo_id)
   LIMIT  50
   ) sub2;

With an index on foo_id, this blazingly fast, no matter how big the table. (A primary key serves just fine.) Compare performance with EXPLAIN ANALYZE.

How?

999999 is an estimated row count of the table, rounded up. You can get it cheaply from:

SELECT reltuples FROM pg_class WHERE oid = 'foo'::regclass;

Round up to easily include possible new entries since the last ANALYZE. You can also use the expression itself in a generic query dynamically, it's cheap. Details:

Fast way to discover the row count of a table in PostgreSQL

55 is your desired number of rows (50) in the result, multiplied by a low factor to easily make up for the gap ratio in your table and (unlikely but possible) duplicate random numbers.

If your primary key does not start near 1 (does not have to be 1 exactly, gaps are covered), add the minimum pk value to the calculation:

min_pkey + trunc(random() * 999999)::int

Detailed explanation here:

Best way to select random rows PostgreSQL

The MVCC row count issue used to drive me nuts when I first migrated to Postgres from MySQL. Now MySQL drives me nuts. Good explanation (in the other post too). How can any RDBMS live without a generate_series equivalent would be another interesting question. — John Powell, Aug 29 '14 at 19:53
@JohnBarça 'generate_series' can be emulated by selecting row number from any large enough table. Also there are othe DB specific ways to replace `generate_series` like `connect by level <=50` in Oracle. — Ihor Romanchenko, Aug 29 '14 at 21:16
@IgorRomanchenko. True, but selecting rows from existing tables isn't really an equivalent solution and obviously you can do anything in Oracle if you are prepared to pay for it. My comments were more directed at SQL Server and MySQL. — John Powell, Aug 29 '14 at 21:25

Joining a series in postgres with a select query

2 Answers2

How?