Sorting randomly with grouping in postgresql

Question

I have a table where the data may look like so:

ID | PARENT_ID | several_columns_to_follow
---+-----------+--------------------------
 1 |         1 | some data
 2 |         2 | ...
 3 |         1 | ...
 4 |         4 | ...
 5 |         3 | ...
 6 |         1 | ...
 7 |         2 | ...
 8 |         3 | ...

Per requirements, I need to allow sorting in two ways (per user request):

1) Random order of ID's within sequential parents - this is achieved easily with

SELECT * FROM my_table ORDER BY parent_id, random()

2) Random order of ID's within randomly sorted parents - and this is where I'm stuck. Obviously, just sorting the whole thing by random() won't be very useful.

Any suggestions? Ideally, in one SQL statement, but I'm willing to go for more than one if needed. The amount of data is not large, so I'm not worried about performance (there will never be more than about 100 rows in the final result).

score 0 · Answer 1 · edited May 09 '12 at 13:17

0

random() really is a nasty function in a final sort. If almost random is good enough, maybe you could sort on some hashed-up function of the row values, like

SELECT * FROM my_table 
ORDER BY parent_id, (id *12345) % 54321
     ;

edited May 09 '12 at 13:17

Aleks G

56,435
29
168
265

answered May 09 '12 at 13:15

wildplasser

43,142
8
66
109

Unfortunately, this wouldn't give me what I'm after. I need to get different sorting order every time (or at least almost) every time I query the same data. That is, without changing the data, the sorting order must change. Therefore the sorting cannot rely on anything in the row or any other static value. It truly needs to be random. And, by the way, why is it "nasty"? We use this sorting in hundreds of places in the system, which has been operating happily for years. – Aleks G May 09 '12 at 13:17
You could add a term derived from the current time? Nasty in the sense that random() is unstable (Duh), and that the final sort **must** take place (order in the results from the subqueries is ignored). For example: try using rand() as a compare function for qsort(). – wildplasser May 09 '12 at 13:19

score 0 · Accepted Answer · answered May 09 '12 at 13:28

0

Maybe this will do:

ORDER BY ((parent_id * date_part('microseconds', NOW()) :: Integer) % 123456),
((id * date_part('microseconds', NOW()) :: Integer) % 123456);

Maybe a prime number instead of 123456 will yield "more random" results

answered May 09 '12 at 13:28

niko

1,816
13
13

Exactly. It would be nice it there were some parametrised hash function for this kind of thing. – wildplasser May 09 '12 at 13:32
Yep, this is pretty much what I'm after. And, yes, using a large-ish prime yields better results. I used 391939 in my case (just a random 6-digit prime number - no pun intended). – Aleks G May 09 '12 at 13:48
Just for fun, any idea on how to pseudo-random sort a table with about 100,000,000 records in a time that would be suitable for a web application? Say, under 2 seconds? – Aleks G May 09 '12 at 13:50
@AleksG: This is just not possible with the hardware of today. Does not matter really, because no application can handle 100 mio. rows at a time anyway. [Here is an idea how to select a bunch of truly random rows from a huge table *fast*.](http://stackoverflow.com/a/8675160/939860) – Erwin Brandstetter May 10 '12 at 02:00

score 0 · Answer 3 · answered May 09 '12 at 13:50

0

If it's never more than 100 records, it may make sense just to select all the records, place it in a local table (or in memory data structure on your client end) and just do a proper Fisher-Yates shuffle.

answered May 09 '12 at 13:50

JayC

7,053
2
25
41

score 0 · Answer 4 · answered May 09 '12 at 16:04

0

Something like this should do it:

WITH r AS (SELECT id, random() as rand FROM my_table ORDER BY rand)
SELECT m.* FROM r JOIN my_table m USING (id)
  ORDER BY r.rand, random();

answered May 09 '12 at 16:04

kgrittn

18,113
3
39
47

Sorting randomly with grouping in postgresql

4 Answers4