You're almost there, you just need to do the grouping using your item_category
table since that's where the cat_id's are.
SELECT ...
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
GROUP BY c1.cat_id
HAVING COUNT(*) < 4
Then once you've got that, you know that c1
contains the top four images per category. You can then join c1
to the image
table to get other attributes:
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4;
Although this isn't strictly legal SQL due to the single-value rule, MySQL will permit it.
Copied from comments thread:
I would fetch the full result, store it in a cache, and then iterate over it however I want, using application code. That would be far simpler and have better performance. SQL is powerful, but another solution may be easier to develop, debug, and maintain.
You can certainly use LIMIT
to iterate through the result set:
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 4 OFFSET 16;
But keep in mind that doing an OFFSET means that it has to run the query over again each time you view another set of them. There are optimizations in MySQL so that it quits a query once it has found enough rows, but it's still expensive if you iterate frequently, and advance far into the series of pages.
Two possible optimizations you can use: One is to cache part of the result, on the theory that few users will want to advance through every page of a large paginated result. So for example, fetch enough to populate ten pages worth of results, and cache that. It reduces the number of queries a lot, and perhaps only 1% of the times will user advance into the next set of ten pages.
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 40 OFFSET 40; /* second set of ten pages */
Another optimization, if you can assume that any view of page N
will be coming from a view of page N-1
, is for the request to filter the categories based on the greatest category id seen in the N-1
st page. You need to do it this way because OFFSET works by row number in the result set, but indexed offsets work by values found on those rows. These aren't the same offset if there may be gaps or unused cat_id values.
SELECT i.id, i.title, c.cat_name AS CAT
FROM item_category AS c1
LEFT OUTER JOIN item_category AS c2
ON c1.cat_id = c2.cat_id AND c1.image_id < c2.image_id
INNER JOIN image AS on c1.image_id = i.id
INNER JOIN categories AS c on c1.cat_id = c.id
WHERE c1.cat_id > 47 /* this value is the largest seen in previous page */
GROUP BY c1.image_id
HAVING COUNT(*) < 4
ORDER BY c.cat_id
LIMIT 40; /* no offset needed */
Re your comments:
... using LIMIT and OFFSET will only trim those results and not move me down the list of rows.
LIMIT
is working as intended; it applies to the resulting rows after GROUP BY
and HAVING
have done their work.
The way I was doing it before the greatest N per category query is by
1. pulling in x amount of images,
2. Remembering which was the last image, and then
3. using a sub query on my subsequent queries to get the next x amount of images with ids smaller than than the last image. Is something like that possible with greatest N per group?
That's what my WHERE
clause does in the last example above, without using a subquery. And I'm assuming you're advancing to the next higher set of cat_id's. This solution works only if you're advancing one page at a time, and in the positive direction.
All right, there's another solution for greatest-n-per-group that works with MySQL, but it relies on the user variables feature. SQLite doesn't have this feature.
SELECT * FROM (
SELECT
p.id as image_ID, p.imageURL as URL, c.cat_name as CAT, ic.cat_id,
IF(@cat=ic.cat_id, @row:=@row+1, @row:=1) AS _row, @cat:=ic.cat_id AS _cat
FROM (SELECT @cat:=null, @row:=0) AS _init
CROSS JOIN image_category AS ic
INNER JOIN portfolio AS p ON ic.image_id = p.id
INNER JOIN categories AS c on ic.cat_id = c.cat_id
ORDER BY ic.cat_id, ic.image_id
) AS x
WHERE _row BETWEEN 4 AND 6; /* or choose any range you want */
This is similar to using ROW_NUMBER() OVER (PARTITION BY cat_id)
that is supported by standard SQL and most RDBMS, but SQLite doesn't support that either yet.