0

In my case I am using an embedded H2 database but my problem is in fact general SQL one.

Consider this table in which one record may or may not reference another and in which the same record will never be referenced from more than one place.

CREATE TABLE test (id NUMBER, data VARCHAR, reference NUMBER) ;
INSERT INTO test (id, data) 
SELECT x, 'P'||x FROM system_range(0, 9);
UPDATE test SET reference = 2 where id = 4;
UPDATE test SET reference = 4 where id = 6;
UPDATE test SET reference = 1 where id = 7;
UPDATE test SET reference = 8 where id = 9;

SELECT * FROM test ORDER BY id;

ID  DATA    REFERENCE
----------------------------------
0   P0      null 
1   P1      null 
2   P2      null 
3   P3      null 
4   P4      2
5   P5      null 
6   P6      4 
7   P7      1 
8   P8      null 
9   P9      8 

Now I would like have an SQL that will select test records in a random order with the only one restriction a referenced record will never be selected before the one referring to it.

One thing that will work would be SELECT * FROM test ORDER BY reference, RAND() but to me this does not seem to be random enough as it will always select all non-referenced records first which reduces the level of randomness.

Say a good and valid result set cold be the one below.

ID  DATA    REFERENCE
----------------------------------
8   P8      null 
2   P2      null 
1   P1      null 
4   P4      2
3   P3      null 
9   P9      8 
5   P5      null 
6   P6      4 
0   P0      null
7   P7      1 

I would prefer a pure SQL solution but giving H2 is easy to extend I would not run out creating a custom function by exposing my own Java method.

UPDATE This is not a duplicate of How to request a random row in SQL because:

  1. In addition to the randomness request I have the reference restriction. In fact the level of complexity of my problem comes from this reference restriction and not from the random.
  2. I need to select all table records not only one
Community
  • 1
  • 1
Julian
  • 3,678
  • 7
  • 40
  • 72
  • Why not read the records into the client, and then randomize it? – Elliott Frisch Mar 21 '16 at 23:02
  • Possible and this will be my way to go if I cannot solve it using just SQL. However I don't want to run out a pure SQL solution – Julian Mar 21 '16 at 23:06
  • 1
    Possible duplicate of [How to request a random row in SQL?](http://stackoverflow.com/questions/19412/how-to-request-a-random-row-in-sql) – Bulat Mar 21 '16 at 23:09
  • This is not a duplicate as I have the reference restriction in addition to that. – Julian Mar 21 '16 at 23:27
  • Your data structure seems to be a forest of n-ary trees. With the reference restriction you are saying that once a node is selected you cannot select its parent. This is probably not something you can code directly in SQL since it requires keeping track of all previously selected nodes (rows). You'll have to load the rows into a set of trees in memory and navigate the structure yourself. – Jim Garrison Mar 21 '16 at 23:48
  • why you sort by reference if you want only random order SELECT * FROM test ORDER BY RAND() – Adam Silenko Mar 22 '16 at 00:36
  • Thank Jim. I guess in the end this is what I will end up by doing. However if I had to solve this problem in Oracle I am quite confident that by using the support for Hierarchical Queries (START WITH and CONNECT BY) I would be able to assign a rand() value to the root of each three and add a small fraction (say 0.00000001) multiplied by the LEVEL pseudo column so every three will be sorted. But to my knowledge this is not possible in H2. – Julian Mar 22 '16 at 02:21
  • Hi Bulat. Can you please remove the possible duplicate flag? Thanks. – Julian Mar 22 '16 at 20:06

1 Answers1

1

Well you should never say never before you actually dig a bit further. When I added my comment for Jim I actually asked myself if H2 comes up with a Hierarchical Queries Oracle equivalent. And of course there is something as explained in the H2 documentation under advanced section H2 recursive queries

So here there is my working query that pretty much satisfies my requirement:

WITH link(id, data, reference, sort_val, level, tree_id) AS (
    -- Each tree root starts with a random sorting value up to half the number of records.
    -- This half the number of records is not really needed it can be a hard coded value
    -- I just said half to achieve a relative uniform distribution of three ids
    -- take the id of the starting row as a three id
    SELECT id, data, reference, round(rand()*(select count(*) FROM test)/2) AS sort_val, 0, id FROM test WHERE reference IS NULL

    UNION ALL

    -- Increase the sort value by level for each referencing row
    SELECT test.id, test.data, test.reference, link.sort_val + (level + 1) AS sort_val, level + 1, link.tree_id
       FROM link
       JOIN test ON link.id = test.reference
)
-- sort value, level and tree id are printed here just to make it easier to understand how it works
SELECT id, data, reference, sort_val, level, tree_id
  FROM link
 ORDER BY sort_val;
Julian
  • 3,678
  • 7
  • 40
  • 72