With over 1800 tagged questions for greatest-n-per-group
and some excellent answers, I thought I'd find a solution to this one - but I've either missed the solution or I need a new approach.
I have a table photo_types
to store votes by user
who are voting (up or down) on what particular photo_type
they think a given photo is. Photo types are 1-10
and each vote will be 1
or -1
.
+----+-----+-----------+------------+------+
| id | user | photo_id | photo_type | vote |
+----+------+----------+------------+------+
| 1 | jane | photo1 | 1 | 1 |
| 2 | jane | photo2 | 2 | 1 |
| 3 | jane | photo3 | 4 | -1 |
| 4 | ben | photo1 | 1 | 1 |
| 5 | ben | photo2 | 3 | -1 |
| 6 | ben | photo2 | 2 | 1 |
| 7 | mary | photo1 | 1 | -1 |
| 8 | mary | photo3 | 10 | 1 |
| 9 | mary | photo2 | 1 | 1 |
| 10 | mary | photo1 | 2 | -1 |
+----+------+----------+------------+------+
I need to left join this table back to a photos
table (that has all the other details of a given photo) - but only include the top 2 voted types for each photo.
The photos
table to which I need to LEFT JOIN
the photo_types
table looks like:
+----+----------+------------+----------------+---------------+------------+
| id | photo_id | photo_name | photographer | location | date |
+----+----------+------------+----------------+---------------+------------+
| 1 | photo1 | the bridge | Bill Murray | Brooklyn, NY | 2012-10-11 |
| 2 | photo2 | the cat | Jacques Chirac | Paris, France | 2013-01-03 |
| 3 | photo3 | a car | the Grinch | London, UK | 2016-09-01 |
+----+----------+------------+----------------+---------------+------------+
I'm obviously joining the two tables via photo_id
.
To get the top voted types for each photo I have tried a sub-query like this:
SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0
ORDER BY votes DESC
which groups the sum of the votes by photo_type
as well as photo_id
.
This works fine but includes all types with sum(vote) > 0
- not just the top 2 voted types.
SQL Fiddle here
When included in the join it looks like:
SELECT *
FROM photos
LEFT JOIN
(SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0
ORDER BY votes DESC) AS pt
ON photos.photo_id = pt.photo_id
WHERE photos.date > '2010-01-01';
I had hoped to use Bill Karwin's solution but I'm having trouble joining the table to itself based on the grouped values (which is a SUM
in my case). The sub-query I tried looked like :
SELECT pt1.*, SUM(pt1.vote) AS votes1, SUM(pt2.vote) AS votes2
FROM photo_types AS pt1
LEFT OUTER JOIN photo_types AS pt2
ON pt1.photo_id = pt2.photo_id
AND (votes1 < votes2
OR (votes1 = votes2 AND pt1.id < pt2.id))
WHERE pt2.photo_id IS NULL
...which doesn't work because it's an attempt to join two tables on a calculated value (unlike Bill's solution).
SQL Fiddle here
Question
Is there a way to get the greatest-n-per-group
when the grouping is based on a calculated values such as SUM(xxx)
?
Solutions which partially cover this are here and here but don't include aggregates in the grouped values.
The other obvious way to do it is to simply re-calculate the highest voted values each time a vote is placed and store this directly in the photos
table - as discussed here - but unless it's impossible - I'd prefer to calculate within the SELECT
for various reasons.