If the order needs to be "shuffled" but not truly random...
(Update: see my other answer for a more flexible and randomizable solution.)
You say "random" order, which is what you get when calling ORDER BY random()
- for each row, PostgreSQL calls random()
, gets a value, and uses that to decide how to sort that row within the set of results.
To make this repeatable, you have to mess with seeds. This feels icky. According to the docs:
the effects will persist until the end of the session, unless overridden by another SET
I think this means that when using a connection pool, setseed
mutates the connection for the next process that uses that connection.
What about modulo?
I have a case where I don't need true randomness. My criteria are:
- not the same order every time
- predictable order within pages of the same result set, so that we don't get duplicates on subsequent pages
Eg, this would be fine:
- Listing 1
- page 1: items 1, 4
- page 2: items 3, 2
- Listing 2 (different user, or same user coming back later)
- page 1: items 3, 1
- page 2: items 2, 4
To get something like this, modulo seems to work well. Eg, ORDER BY id % 7, id
for all pages of request 1, and ORDER BY id % 11, id
for all pages of request 2. That is, for each row, divide its id by the modulus and sort by the remainder. Within rows with the same remainder, sort by id (to ensure the sort is stable).
The modulus could be picked randomly for the first page, then reused as a parameter for each subsequent page request.
You can see how this might work for your database like this:
echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt
A prime modulus will probably give you the most variation. And if your ids start at 1 (such that % 77
would make the first 77 rows return in the normal order), you could try doing a modulus on a timestamp field instead. Eg:
ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77
But you'd need a function index to make that performant.