0

I have three tables - tblpollquestions, tblpollanswers and tblpollresponses.

I want to select a random question that a user hasn't responded to yet, with the respective answers.

The SQL below returns exactly what I need, but I'm concerned that it takes three SELECTs to do it. There must surely be a more efficient way?

SELECT
    poll.id,
    poll.question,
    a.answer
FROM tblpollquestions poll
INNER JOIN tblpollanswers a ON a.question_id = poll.id
INNER JOIN (
    SELECT id FROM tblpollquestions WHERE id NOT IN(
        SELECT question_id FROM tblpollresponses WHERE user_id = 1
    ) ORDER BY RAND() LIMIT 1
) as t ON t.id = poll.id
Wintermute
  • 2,973
  • 4
  • 32
  • 52

3 Answers3

0

This could be made a bit better by switching NOT IN(SELECT...) into LEFT JOIN

SELECT
    poll.id,
    poll.question,
    a.answer
FROM 
    tblpollquestions poll
INNER JOIN 
    tblpollanswers a 
ON 
    a.question_id = poll.id
INNER JOIN (
    SELECT 
        q.id 
    FROM 
        tblpollquestions AS q
    LEFT JOIN
        tblpollresponses AS r
    ON
        q.id = r.question_id
        AND r.user_id = 1
    WHERE
        r.question_id IS NULL
    ORDER BY RAND() LIMIT 1
) as t ON t.id = poll.id

ORDER BY RAND() can also be slow if there are many rows in tblpollquestions table. See this presentation from Bill Karwin (slide 142 and onwards) for some other ideas on selecting a random row.

http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back

Mchl
  • 61,444
  • 9
  • 118
  • 120
  • MySQL optimizes `ORDER BY RAND() LIMIT 1` very nicely - it's actually quite fast. If you are comparing to my query below, LEFT JOIN with IS NULL and NOT EXISTS should have equal speeds in MySQL. (Take a look at the explain - they should be the same.) But I think doing it in a where will be faster than a join. Edit: Maybe not: http://jan.kneschke.de/projects/mysql/order-by-rand/ - you will need to explain both versions and decide based on that. – Ariel Jul 26 '11 at 12:07
  • I haven't heard of ORDER BY RAND() optimization. Can you provide any link to information about it? As far as your `NOT EXISTS` subquery goes - you use dependent subquery - I think MySQL 5.4 or newer can deal with this, but older versions are notorious for running such subqueries once for every row in outer query. – Mchl Jul 26 '11 at 12:11
  • See here: http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html MySQL does not actually sort the entire table by rand() - it just gets one row and stops. – Ariel Jul 26 '11 at 12:13
  • @Ariel Nope, `ORDER BY RAND() LIMIT 1` is very slow and on larger data sets almost unusable. – Karolis Jul 26 '11 at 12:16
  • 1
    Read further: `If a filesort must be done, all rows that match the query without the LIMIT clause must be selected, and most or all of them must be sorted, before it can be ascertained that the first row_count rows have been found.` - and ordering by non-deterministic function (like `RAND()`) will not use any indexes. – Mchl Jul 26 '11 at 12:16
  • 1
    @Karolis & Mchl, you guys are right. I just did some tests and it's slow. – Ariel Jul 26 '11 at 12:20
  • It's good you decided to test it rather than believe our words ;) Check the presentation I linked in the answer - it's really good. – Mchl Jul 26 '11 at 12:23
  • 1
    This did the job nicely, thank you. And thanks for the info about RAND() - that's a really good presentation. I don't anticipate my questions table getting >100 rows, so it's not a major performance issue for now (loads gun, takes aim at big toe) but I'll keep it in mind. – Wintermute Jul 26 '11 at 14:02
  • Bill Karwin has an answer in SO too, regarding fetching a random row from a big table: http://stackoverflow.com/questions/6541644/selecting-random-rows-with-mysql/6542113#6542113 – ypercubeᵀᴹ Jul 30 '11 at 07:02
0

Is seems fine to me, although I would change it slightly:

SELECT
  poll.id,
  poll.question,
  a.answer
FROM tblpollquestions poll
INNER JOIN tblpollanswers a ON a.question_id = poll.id
WHERE poll.id = (
    SELECT id FROM tblpollquestions WHERE NOT EXISTS (
        SELECT * FROM tblpollresponses WHERE user_id = 1 AND question_id = tblpollquestions.id )
    ORDER BY RAND() LIMIT 1)

Written that way should do a better job of using indexes, and not checking the join conditions for every single tblpollanswers.

Make sure you have a UNIQUE index (or primary key) on tblpollresponses for (user_id, question_id) (in that order). If you need it for other queries, you can add an additional UNIQUE index with the columns in the reverse order.

Edit: Actually putting it in the where might not be so good http://jan.kneschke.de/projects/mysql/order-by-rand/ You will need to explain the query and compare.

Ariel
  • 25,995
  • 5
  • 59
  • 69
0

Use left join like this:

SELECT ques.id, ques.question, ans.answer FROM tblpollquestions ques
INNER JOIN tblpollanswers ans ON(ans.question_id = ques.id)
left join tblpollresponses res on(res.question_id=ques.id and user_id = 1)
where res.question_id is null ORDER BY RAND() LIMIT 1;

I changed your table aliases to make better sense.

Ms Vt
  • 1
  • 1