Repeatable pagination with randomly ordered rows

Question

I have API that returns dome paginated rows from DB. It works, however when I order rows by RANDOM() I get duplicates on consecutive pages. Is there any option to set random seed per query?

If not is it possible to set random SEED globally to force RANDOM() to generate same values per query? Then I could just change global random every 3 minutes or something like that...

U use this code:

SELECT * FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5

Now I want pass seed to this query so I can paginate random results. I should do this like this?:

SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5
SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 10 LIMIT 5

And results will be correctly paginated?

`OREDER BY random()` usually not a good idea http://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql — pozs, Aug 25 '14 at 10:37
I read that, but it wont work in my case (I think so) as data in table may change any time and there may be a lot of "gaps" in IDs... — user606521, Aug 25 '14 at 12:22
If you want the same ordering for 3 minutes, can you not create a temp table every 3 minutes based on your existing values (seeing as there are gaps) and then use a modification of Erwin's approach in the linked answer from pozs for speed purposes? — John Powell, Aug 25 '14 at 14:33
I guess I can just create 'random' column in my table and update it with random values every 3 minutes? — user606521, Aug 25 '14 at 15:25
How big is your table and how many inserts, deletes are going on? If not too big, it would probably be cleaner to create a temp table every 3 minutes and use that as a basis for random ordering via a join. — John Powell, Aug 25 '14 at 16:37
Table will contain 500k-1M rows but I will fetch only small subset of rows (5% max) based on some conditions and then I have to randomize and paginate those 5%... — user606521, Aug 25 '14 at 20:42

Nathan Long · Answer 1 · 2019-06-03T14:43:29.930

If the order needs to be "shuffled" but not truly random...

(Update: see my other answer for a more flexible and randomizable solution.)

You say "random" order, which is what you get when calling ORDER BY random() - for each row, PostgreSQL calls random(), gets a value, and uses that to decide how to sort that row within the set of results.

To make this repeatable, you have to mess with seeds. This feels icky. According to the docs:

the effects will persist until the end of the session, unless overridden by another SET

I think this means that when using a connection pool, setseed mutates the connection for the next process that uses that connection.

What about modulo?

I have a case where I don't need true randomness. My criteria are:

not the same order every time
predictable order within pages of the same result set, so that we don't get duplicates on subsequent pages

Eg, this would be fine:

Listing 1
- page 1: items 1, 4
- page 2: items 3, 2
Listing 2 (different user, or same user coming back later)
- page 1: items 3, 1
- page 2: items 2, 4

To get something like this, modulo seems to work well. Eg, ORDER BY id % 7, id for all pages of request 1, and ORDER BY id % 11, id for all pages of request 2. That is, for each row, divide its id by the modulus and sort by the remainder. Within rows with the same remainder, sort by id (to ensure the sort is stable).

The modulus could be picked randomly for the first page, then reused as a parameter for each subsequent page request.

You can see how this might work for your database like this:

echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt

A prime modulus will probably give you the most variation. And if your ids start at 1 (such that % 77 would make the first 77 rows return in the normal order), you could try doing a modulus on a timestamp field instead. Eg:

ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77

But you'd need a function index to make that performant.

score 5 · Answer 2 · answered Aug 27 '14 at 00:24

5

With this union all technique the random order is repeatable

select a, b
from (
    select setseed(0.1), null as a, null as b

    union all

    select null, a, b
    from t

    offset 1
) s
order by random()
offset 0
limit 5
;

answered Aug 27 '14 at 00:24

Clodoaldo Neto

118,695
26
233
260

score 2 · Answer 3 · answered Aug 25 '14 at 09:39

You can use setseed(dp) to seed random() with a seed in [-1.0, 1.0]. E.g.:

engine=> SELECT SETSEED(0.16111981);
 setseed 
---------

(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.205839179921895
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.379503262229264
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.268553872592747
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.788029655814171
(1 row)

And of course, each time you reseed, you'll get the exact same result:

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895

(clarification: output was copied from psql, engine is the name of my database)

Nathan Long · Answer 4 · 2019-06-03T14:41:38.490

Specify the exact row IDs (randomized ahead of time)

This query will give you rows with id 4, 2, 1, and 4 again, in that exact order.

SELECT items.id, items.name
FROM items
-- unnest expands array values into rows
INNER JOIN unnest(ARRAY[4,2,1,4]) AS item_id
ON items.id = item_id

yields

 id |     name
----+---------------
  4 | Toast Mitten
  2 | Pickle Juicer
  1 | Horse Paint
  4 | Toast Mitten

Knowing that, you can supply the ids that should be included on each page however you want.

For example, you could SELECT id FROM items ORDER BY random(), break the list into "pages" of (say) 5 ids each, and hold it in application memory, or in Redis, or wherever. For each page requested, you'd run the query above with the correct page of ids.

Variations:

For true randomness, you could enable pgcrypto and ORDER BY gen_random_uuid().
You could omit the ORDER BY and shuffle the ids in-memory in your programming language.
You could create a different shuffling per user or per day

score 0 · Answer 5 · answered Feb 24 '20 at 17:57

A very quick and dirty option for a table with primary keys that are UUIDs (or some other kind of hash) is to order the results by substrings of the UUIDs. In your app, randomly generate a few positive integers between 1 and 36 and use these positions in your query.

For instance, if I want five numbers and generate {23, 12, 35, 16, 3}, I'd use:

select * from (
  select
    organization_id,
    substring(cast(organization_id as text) from 23 for 1) as o1,
    substring(cast(organization_id as text) from 12 for 1) as o2,
    substring(cast(organization_id as text) from 35 for 1) as o3,
    substring(cast(organization_id as text) from 16 for 1) as o4,
    substring(cast(organization_id as text) from 3 for 1) as o5
  from channels_organizations
) t order by o1, o2, o3, o4, o5;

This would generate results like the following, which you can paginate easily:

organization_id                       o1  o2  o3  o4  o5
a059cd76-9d91-48db-8982-986fcd217b2a   2   9   2   8   5
3ce14f26-3e56-46eb-9a74-22862cc3ed4e   4   5   4   6   e
8e115b7e-2e7e-480e-9bc6-296deff3ed87   6   7   8   8   1
e1969c52-5028-47da-92ea-9f2918dcbf4d   a   2   4   7   9
42eb7292-e881-4a04-b83a-3bf78548dab4   a   8   b   a   e
e8a33112-532f-4fec-b25b-416c5409ac7e   b   2   7   f   a
a763efaa-79a4-4cfa-92bc-803ebc5ff221   c   a   2   c   6
581cae5b-5000-4aa6-837d-002ccf806e28   d   0   2   a   1
6b0ed7b4-b44d-4a51-910f-f3f2f0354d55   f   4   5   a   0
9369a547-f7e0-43e7-96ef-62bf631a0f0b   f   e   0   3   6

In this example, I pick an index within each group of the UUID format and shuffle them before sending to the DB.

You could simplify this if you're less concerned with pseudorandomness by using just a single substring index, although you'd want to increase the substring length (the n in from x for n) to at least 3 since there's a chance you'd end up with a dash in your substring.

Repeatable pagination with randomly ordered rows

5 Answers5

If the order needs to be "shuffled" but not truly random...

What about modulo?

Specify the exact row IDs (randomized ahead of time)