3

I have two tables in my database:

(1) PHRASES:

t_phrase
========
I like
They prefer
...
Somebody else wants

and

(2) PLACES:

n_id   t_place
====   =======
1      London
2      Paris
...
N      New York

Table PHRASES has at least as many rows as PLACES. I need to join these two tables in such a way as to select all places with one phrase for each of them - but phrases need to be randomly distributed across places. The overall places table isn't too big: maybe, about 3-4 thousand rows, however there will be an additional WHERE clause on it that will limit the output to about 200 places at most.

Ideally, I'd like this to be in one SQL statement, but so far I haven't been able to get my head around this. Therefore the second option is a stored function returning a row of (int, varchar, varchar). For this, I was thinking of something along the lines of:

  1. select all phrases in random order into an array of varchar
  2. loop over places taking one at a time and returning it along with the next phrase from the array

Somehow this seems to me very inefficient, but I can't come up with anything better.

Can you suggest any better idea? Or, even better, one statement SQL, maybe?

Thanks in advance.

EDIT: Please note that the phrases should NOT be repeated in the resultset. There are always at least as many phrases as there are places.

Aleks G
  • 56,435
  • 29
  • 168
  • 265

2 Answers2

2
WITH p AS (
    SELECT place, row_number() OVER () AS rn
    FROM   t_place
    WHERE  <some condition>
    )
    , ph AS (
    SELECT phrase, row_number() OVER (ORDER BY random()) AS rn
    FROM   t_phrase
    )
SELECT ph.phrase, p.place
FROM   p
JOIN   ph USING (rn);

It won't get any more random, if you impose a truly random order on both tables, it will only get slower. I impose the random order on phrases, because:

There are always at least as many phrases as there are places.

It needs to be done with the bigger set, lest some non-random part might get cut off. For the smaller set (places), on the other hand, any sequence of numbers without gaps is good, so I pick the fastest way.

My example uses CTEs, but it can be done with subqueries just as well. Both CTE and window functions require PostgreSQL 8.4 or later.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

I think the following will work:

select (select phrase from phrases order by random() limit 1),
       place
from places

The select within the select should be called for each row, so it should return a different value each time.

If you want just a random arrangement of the phrases and places, you can use windows functions:

select ph.phrase, p.place
from (select place, row_number() over (order by place) as seqnum
      from places p
     ) p join
     (select phrase, row_number() over (order by random()) as seqnum
      from phrases
     ) ph 
     on p.seqnum = ph.seqnum

This orders the places by place (or any field could do). It randomizes the phrases, and joins on the resulting row numbers.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This won't work, as it will select random each time resulting potentially in repeated phrases. – Aleks G Jul 05 '12 at 15:38
  • I misunderstood. I thought you wanted a random phrase for each place. I will revise the solution for this case. – Gordon Linoff Jul 05 '12 at 15:42
  • What's the performance implications of the first example? I'd be concerned that, despite the `LIMIT 1`, it's still going to run over the entire table for every `place` (granted, not a large result set here, but...) – Clockwork-Muse Jul 05 '12 at 16:09
  • 1
    If you are interested in the performance implications, check this out: http://stackoverflow.com/questions/5297396/quick-random-row-selection-in-postgres. There is no a priori reason why postgres should read the table each time, but that is up to the database implementors. – Gordon Linoff Jul 05 '12 at 16:20
  • Yep, this works perfectly, even on our antiquated 8.3. Thanks! – Aleks G Jul 06 '12 at 08:37