5

I am using PostgreSQL 9.4.

I have a table of workouts. Users can create multiple results for each workout, and a result has a score.

Given a list of workout_ids and two user_ids, I want to return the best score for each workout for each user. If the user does not have a result for that workout, I want to return a padded/null result.

SELECT "results".*, "workouts".* 
FROM "results" LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id" 
WHERE (
  (user_id, workout_id, score) IN 
  (SELECT user_id, workout_id, MAX(score) 
    FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3) 
    GROUP BY user_id, workout_id)
) 

In this query, the left join is acting as an inner join; I'm not getting any padding if the user has not got a result for the workout. This query should always return six rows, regardless of how many results exist.

Example data:

results
user_id | workout_id | score 
-----------------------------
      1 |          1 |     10
      1 |          3 |     10
      1 |          3 |     15
      2 |          1 |      5

Desired result:

results.user_id | results.workout_id | max(results.score) | workouts.name
-------------------------------------------------------------------------
              1 |                  1 |                 10 | Squat
              1 |                  2 |               null | Bench
              1 |                  3 |                 15 | Deadlift
              2 |                  1 |                  5 | Squat
              2 |                  2 |               null | Bench
              2 |                  3 |               null | Deadlift
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Ben Smith
  • 851
  • 2
  • 10
  • 22

3 Answers3

4

The where filters out your NULL values, so that is why the result is not what you expect.

Joinint the WHERE clause results instead of filter the where clause results.

SELECT "results".*, "workouts".*,"max_score".*
FROM "results" 
LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
LEFT JOIN (SELECT user_id, workout_id, MAX(score) 
    FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3) 
    GROUP BY user_id, workout_id) max_score ON workouts.workout_id=max_score.workout_id;

You need to alter the SELECT to get the correct columns.

Norbert
  • 6,026
  • 3
  • 17
  • 40
  • This returns rows for me where the user_id IN (1, 2) AND workout_id IN (1, 2, 3) constraints are not adhered to. – Ben Smith Mar 23 '15 at 02:33
  • So the join should be on user_id and workout_id against the results table instead? – Norbert Mar 23 '15 at 02:35
  • 1
    I think the left join of `max_score` needs to be on `(workout_id,user_id,score)` – FuzzyTree Mar 23 '15 at 02:41
  • @FuzzyTree : You are probably right. The example data was added after the answer. Ben is now hopefully on the right track to get it finished himself :) – Norbert Mar 23 '15 at 02:44
2
SELECT DISTINCT ON (1, 2)
       u.user_id
     , w.id AS workout_id
     , r.score
     , w.name AS workout_name
FROM   workouts w
CROSS  JOIN (VALUES (1), (2)) u(user_id)
LEFT   JOIN  results r ON r.workout_id = w.id
                      AND r.user_id = u.user_id
WHERE  w.id IN (1, 2, 3)
ORDER  BY 1, 2, r.score DESC NULLS LAST;

Step by step explanation

  1. Form a complete Cartesian product of given workouts and users.
    Assuming the given workouts always exist.
    Assuming that not all given users have results for all given workouts.

  2. LEFT JOIN to results. All conditions go into the ON clause of the LEFT JOIN, not into the WHERE clause, which would exclude (workout_id, user_id) combinations that have no result. See:

  3. Finally pick the best result per (user_id, workout_id) with DISTINCT ON. While being at it, produce the desired sort order. See:

Depending on the size of tables and data distribution there may be faster solutions. See:

Simple version

If all you want is the maximum score for each (user_id, workout_id) combination, there is simple version:

SELECT user_id, workout_id, max(r.score) AS score
FROM        unnest('{1,2}'::int[])   u(user_id)
CROSS  JOIN unnest('{1,2,3}'::int[]) w(workout_id)
LEFT   JOIN results r USING (user_id, workout_id)
GROUP  BY 1, 2
ORDER  BY 1, 2;

db<>fiddle here
Old sqlfiddle.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

How about using distinct on or row_number()?

SELECT DISTINCT ON (r.user_id, r.workout_id) r.*, w.* 
FROM "results" r LEFT JOIN
     "workouts" w
     ON "w."id" = r."workout_id" 
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
ORDER BY r.user_id, r.workout_id, score desc;

The row_number() equivalent requires a subquery:

SELECT rw.*
FROM (SELECT r.*, w.*,
             row_number() over (partition by user_id, workout_id order by score desc) as seqnum 
      FROM "results" r LEFT JOIN
           "workouts" w
           ON "w."id" = r."workout_id" 
      WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
     ) rw
WHERE seqnum = 1;

You should choose the columns more judiciously than using a *. The subquery might return errors in the case of duplicate column names.

EDIT:

You need to generate the rows first, and then the results for each. Here is one method, building on the second query:

SELECT u.user_id, w.workout_id, rw.score, rw.name
FROM (SELECT 1 as user_id UNION ALL SELECT 2) u CROSS JOIN
     (SELECT 1 as workout_id UNION ALL SELECT 2 UNION ALL SELECT 3) w LEFT JOIN
     (SELECT r.*, w.*,
             row_number() over (partition by user_id, workout_id order by score desc) as seqnum 
      FROM "results" r LEFT JOIN
           "workouts" w
           ON "w."id" = r."workout_id" 
      WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
     ) rw
     ON rw.user_id = u.user_id and rw.workout_id = w.workout_id and
        rw.seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Apologies if I misunderstood your answer, but both of these queries only returned two rows, rather than the desired six (2 user_ids x 3 workout_ids = 6 rows needed). I will annotate my original question with some sample data and the desired result to make it clearer. – Ben Smith Mar 23 '15 at 02:24
  • @BenSmith . . . My misunderstanding. I thought the best was per user, not per user/workout. – Gordon Linoff Mar 23 '15 at 02:25