Retrieving one to many associated data with specific criteria, in a single row

Question

Using Postgres 9.6+, suppose I have a simple survey model with three tables: questions, users, and answers. Each answer refers to a question and contains text written by a user.

Here's a simplified example:

             questions
|----|------------------------------|
| id | text                         |
|----|------------------------------|
|  1 | Do you like apples?          |
|  2 | What is your favorite color? |
|  3 | How are you feeling today?   |
-------------------------------------

    users
|----|-------|
| id | name  |
|----|-------|
|  1 | Alice |
|  2 | Bob   |
--------------

                answers
|----|---------|-------------|---------|
| id | user_id | question_id | text    |
|----|---------------------------------|
|  1 | 1       | 1           | Yes     |
|  2 | 2       | 1           | No      |
|  3 | 1       | 2           | Yellow  |
|  4 | 2       | 2           | Blue    |
|  5 | 1       | 3           | Good    |
|  6 | 2       | 3           | Fine    |
----------------------------------------

My end goal is to be able to feed this into an app that will render a datatable in which each row contains a single user's info alongside their answers to a dynamic set of questions. In other words, it might want to retrieve users' answers for questions 1 and 3 specifically:

|----|-------|---------------------|----------------------------|
| id | name  | Do you like apples? | How are you feeling today? |
|----|-------|---------------------|----------------------------|
|  1 | Alice | Yes                 | Good                       |
|  2 | Bob   | No                  | Fine                       |
-----------------------------------------------------------------

Or any arbitrary combination of questions. Realistically these lists of question_ids might be capped at 10 items or so.

At first, I naively tried to dynamically construct the query JOINing on answers once for each question:

SELECT
  u.name name, 
  a1.text q1, 
  a2.text q2
FROM
  users u
JOIN answers a1 
  ON a1.user_id = u.id AND question_id = 1
JOIN answers a2 
  ON a1.user_id = u.id AND question_id = 3

But as the number of JOINs grows, more nested loops seem to be made and performance started taking an increasingly high hit.

An alternative that came up was to try a subquery...

SELECT
  u.name name, 
  sub.qid question_id,
  sub.text text
FROM
  users u
JOIN (
  SELECT
    a.user_id uid,
    a.question_id qid,
    a.text text
  FROM
    answers a
  WHERE
    a.question_id IN (1, 3)
  GROUP BY
    a.user_id
) sub ON sub.uid = u.id

...but since this produces one row for each answer instead of for each user, I ended up resorting to tricks such as aggregating the answers in a JSONB object to keep the format as desired, replacing the subquery with:

  SELECT
    a.user_id uid,
    jsonb_object_agg(a.question_id, a.text) question_answers
  FROM
    answers a
  WHERE
    a.question_id IN (1, 3)
  GROUP BY
    a.user_id

Which works alright for retrieving the whole dataset, but becomes a pain when retrieving ranges for datatable specific operations... In particular pagination and/or sorting by one of the answers' columns (e.g. sort results by answers of "How are you feeling today?" in descending order).

I feel like I might be missing something here or falling into an antipattern, so I decided to reach for help.

What's the best / most sane way to achieve this kind of result? Thanks a bunch!!

The aggregation into a JSONB object seems your best option. You can still use it to sort: `select * from ( your query here ) t order by questions_answers ->> 'Do you like apples?'` — , Jul 27 '20 at 07:01
@MikeOrganek @a_horse_with_no_name I experimented with both the `crosstab` function and the JSONB sorting notation - both worked flawlessly and run times were reasonable and very similar! In particular I wasn't expecting the JSONB route to be able to leverage indices as well as it did. Super cool. Thanks to both! I'd upvote but my newbie reputation still forbids me. — Giancarlo França, Jul 27 '20 at 18:04

score 0 · Answer 1 · answered Jul 27 '20 at 08:20

I create an example for your question. I used PIVOT for better performance.

CREATE TABLE #questions  (
  id   int,
  [text]  VARCHAR(100),
);
CREATE TABLE #users  (
  id   int,
  [name]  VARCHAR(100),
);

CREATE TABLE #answers  (
  id   int,
  user_id int,
  question_id int,
  [text]  VARCHAR(100),
);

INSERT INTO #questions
  (id, [text])
VALUES
  (1,'Do you like apples?'),
  (2,'What is your favorite color?'),
  (3,'How are you feeling today?')
  ;
  INSERT INTO #users
  (id, [name])
VALUES
  (1,'Alice'),
  (2,'Bob')
  ;
  INSERT INTO #answers
  (id,user_id,question_id, [text])
VALUES
  (1,1,1,'Yes'),
  (2,2,1,'No'),
  (3,1,2,'Yellow'),
  (4,2,2,'Blue'),
  (5,1,3,'Good'),
  (6,2,3,'Fine')
  ;

   declare @questions VARCHAR(100)
 ;with CTE_question as (select text ttext from #questions)

 select @questions=STRING_AGG(+'['+ttext +']',',') from CTE_question

 

 execute('SELECT * FROM   
(
    SELECT 
        u.name [Name], 
        q.text [Question],
        a.text [Answer]
    FROM 
        #questions q 
                 join #answers a on q.id = a.question_id
                 join #users u on a.user_id=u.id
                 
) t 
PIVOT(
    max([Answer]) 
    FOR [Question] IN (
      '+  @questions+'
        )
) AS pivot_table
')

result =

Name    Do you like apples?   What is your favorite color?  How are you feeling today?
Alice     Yes                   Yellow                               Good
 Bob      No                     Blue                                 Fine

I had no prior experience with pivot tables so it was a bit hard to grasp what's going on. Currently using Postgres which doesn't seem to have support for `PIVOT` so I couldn't test it properly either. But thanks for taking the time to write this example! — Giancarlo França, Jul 27 '20 at 18:09

Retrieving one to many associated data with specific criteria, in a single row

1 Answers1