1

I currently have two tables

question
--------
id
title, character varying

answer
--------
id
question_id
votes, integer

I use the following query to return me a list of questions with its corresponding array of votes:

SELECT question.id,
    question.title,
    ARRAY(SELECT votes
          FROM answer
          WHERE answer.question_id = question.id)
FROM question
ORDER BY question.id

The output looks like:

id  | title    | ?column?                       
----+----------+-----------------------------------------------------
100 | How to   | {5,2,7}
101 | Where is | {0}
102 | What is  | {1}

The above query can take close to 50s to run with hundred of thousands of questions where each question can have at least 5 answers. Is there a way to optimise the above?

Thierry Lam
  • 45,304
  • 42
  • 117
  • 144

3 Answers3

2

You should use a join:

SELECT question.id, question.title, answer.votes
FROM question
JOIN answer ON answer.question_id == question.id
ORDER BY question.id

If you want the output column to contain a concatenated list of all "votes" associated with a question, and you are on Postgres, check out this question: How to concatenate strings of a string field in a PostgreSQL 'group by' query?

Community
  • 1
  • 1
arc
  • 584
  • 2
  • 5
0

If you want the query to produce one row per question, with votes gathered into an array, you can use a join, with array_agg:

SELECT question.id,
    question.title,
    array_agg(answer.votes) as answer_votes
FROM question
JOIN answer ON answer.question_id = question.id
GROUP BY question.id, question.title
ORDER BY question.id
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
0

I recommend creating an index on your answer table, and using your original query.

CREATE INDEX answer_question_id_idx ON answer(question_id);

Without this index, it will have to do a sequential scan of the entire table to find rows with a matching question_id. It will have to do that for every single question.

Alternatively, consider using a join, as arc suggested. I'm not an expert in the matter, but I think Postgres will use a hash join rather than multiple sequential scans, making the query faster. If you want to retain the id/title/array format, use array_agg:

SELECT question.id, question.title, array_agg(answer.votes)
  FROM question
  LEFT JOIN answer ON answer.question_id = question.id
 GROUP BY question.id, question.title
 ORDER BY question.id;

However, there's a caveat. If a question has no answers, you'll get a weird-looking result:

 id |       title       | array_agg 
----+-------------------+-----------
  1 | How do I do this? | {3,5}
  2 | How do I do that? | {NULL}
(2 rows)

This is because of the LEFT JOIN, which creates a NULL value when no rows from the joined table are available. With INNER JOIN, the second row won't appear at all.

That's why I recommend using your original query. It produces the expected result:

 id |       title       | ?column? 
----+-------------------+----------
  1 | How do I do this? | {3,5}
  2 | How do I do that? | {}
Community
  • 1
  • 1
Joey Adams
  • 41,996
  • 18
  • 86
  • 115
  • I already have this index in place and it still takes a while. I don't see a lot of improvements with the joining method after using `EXPLAIN`. – Thierry Lam Mar 05 '12 at 21:15