1

I have the following SQL query that count the distinct quizzes where a tag is below a certain threshold.

SELECT
  COUNT(DISTINCT "quizzes"."id")
FROM
  "quizzes"
  INNER JOIN "sessions" ON "sessions"."id" = "quizzes"."session_id"
  INNER JOIN "subscriptions" ON "subscriptions"."id" = "sessions"."subscription_id"
  LEFT OUTER JOIN "quiz_answers" ON "quiz_answers"."quiz_id" = "quizzes"."id"
  LEFT OUTER JOIN "answers" ON "answers"."id" = "quiz_answers"."answer_id"
  LEFT OUTER JOIN "tag_scores" ON "tag_scores"."answer_id" = "answers"."id"
  LEFT OUTER JOIN "tags" ON "tags"."id" = "tag_scores"."tag_id"
WHERE
  (subscriptions.state = 'subscribed')
  AND (
    tags.id = 56
    and score <= 10
  );

The score is a sum of the same tag present under a same quiz.

I want to be able to add in the query something like this:

SUM(tag_scores.score) group by tags.id, quizzes.id AS score

Such that the score is calculated not on the particular tag score but on the sum of the same tags grouped by tags.id and quizzes.id.

The output should be simply the count number of the result.

I'm having a hard time how to add this to the query.

Any idea or hint how to continue?

Martin
  • 11,216
  • 23
  • 83
  • 140
  • 1
    Could we see some sample data and its expected output? – Bohemian Nov 14 '21 at 03:43
  • I intentionally avoided adding sample data because of all the joints and that it will lead to unnecessary confusion. As what is relevant for the example above, the scores should be summed from `tag_scores.score` and grouped by `tags.id` and `quizzes.id`. The result should be simply a count number of the rows found. – Martin Nov 14 '21 at 03:47
  • `count(*)` gives a count of rows. Does that not work? – Bohemian Nov 14 '21 at 03:53
  • The query I posted work. but it is missing the `SUM(tag_scores.score) group by tags.id, quizzes.id AS score` part which I dont know where to put in the query. – Martin Nov 14 '21 at 03:54
  • `group by tags.id, quizzes.id` disagrees with a single `COUNT(DISTINCT "quizzes"."id")` being emitted. Might make sense if you add `tags.id` to the output. But you don't want that. So do you want to enforce that the sum is below 10 for every included `tag.id` separately? Please explain. – Erwin Brandstetter Nov 14 '21 at 04:30
  • The `COUNT(DISTINCT "quizzes"."id")` should be made after the query does `SUM(tag_scores.score) group by tags.id, quizzes.id AS score` such that the score can be used in the where clause. After that, I need to do the count distinct on the results. – Martin Nov 14 '21 at 04:33
  • So just the *one* score for tag 56 (summed across all answers) must be below 10? Do we allow no tag 56 at all, or exclude those without? – Erwin Brandstetter Nov 14 '21 at 04:39
  • Correct. We exclude all those without as well. – Martin Nov 14 '21 at 04:41

1 Answers1

2

This should do it:

SELECT count(*)
FROM   quizzes q
JOIN   LATERAL (
   SELECT SUM(ts.score) AS sum_score
   FROM   quiz_answers qa
   JOIN   answers      a  ON a.id = qa.answer_id
   JOIN   tag_scores   ts ON ts.answer_id = a.id
-- JOIN   tags         t  ON t.id = ts.tag_id  -- not needed
   WHERE  qa.quiz_id = q.id
   AND    ts.tag_id = 56
   ) AS score56 ON score56.sum_score <= 10
WHERE  EXISTS (
   SELECT FROM sessions se
   JOIN   subscriptions su ON su.id = se.subscription_id
   WHERE  se.id = q.session_id
   AND    su.state = 'subscribed'
   );

The LATERAL join enforces your desired additional filter. Quizzes are eliminated where ...

  • no tag_id 56 is connected at all (then sum_score is NULL).
  • or the sum of all scores for tag 56 is > 10.

LATERAL makes sense for a subset (when WHERE EXISTS ... actually filter more than a few rows). Else, a plain subquery on all answer tags is cheaper.

Since we do not multiply rows from the main table quizzes with joins, we also don't need the expensive count(DISTINCT ...). A simple count(*) does it now.

I moved your original filter to an EXISTS expression since I don't know whether those joins can multiply rows. If sessions and subscriptions are in a 1:1 relationship to quizzes, then you can keep your original joins for those.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Wow! nice query, I'll be studying afterwards. However, I'm getting `ERROR: column score56.score does not exist` – Martin Nov 14 '21 at 04:46
  • @Martin: Note the fix. Column alias was missing. – Erwin Brandstetter Nov 14 '21 at 04:50
  • I noticed when multiple JOIN LATERAL are present searching for scores in different tags and sometimes using OR the query goes exponentially slow: https://pastebin.com/pz87d4HC. Is there any way to prevent this? – Martin Nov 14 '21 at 20:55
  • @Martin: You might be able to do it all in a single lateral subquery. Start a new question with details. – Erwin Brandstetter Nov 14 '21 at 20:57
  • See: https://stackoverflow.com/questions/69967159/optimize-exponentially-slow-query-when-multiple-join-lateral-are-present – Martin Nov 14 '21 at 21:03