The correct query for your request is:
SELECT a.albumid, f.photoname
FROM TBLalbum a
INNER JOIN TBLphotos f # "f" from "first" photo
ON f.albumid = a.id # of this album
LEFT JOIN TBLphotos o # "o" from "other" photo
ON o.albumid = f.albumid # of the same album
AND o.photoid < f.photoid # "o" is before "f"
WHERE o.photoid IS NULL # no such "o" exists
Explanation
This query joins table TBLalbum
(aliased as a
from "album") with TBLphotos
(aliased as f
from "first photo of this album"). Because of the INNER JOIN
, the albums without photos will not be included. Use LEFT JOIN
instead if you want them included.
Next, it joins the table TBLalbum
again (aliased as o
from "other photo"). The first join condition (o.albumid = f.albumid
) ensures the photos selected from o
are in the same album as their matches from f
. The second condition (o.photoid < f.photoid
) pairs a row from f
1 only with the rows from o
having a smaller photoid
(i.e. created earlier).
Replace this with your own definition of earlier in order to get the first photo as expected.
Because the second JOIN
is a LEFT JOIN
, it will add to the result set all the rows from f
1, including those that does not have any match in o
because there is no photo in o
that is earlier than the photo currently selected from f
. For these rows, a row full of NULL
s is used instead of the missing row from o
.
The WHERE
clause filters from the result set produced by the joins only the pairs (a
, f
, o
) that have NULL
values in o.photoid
; i.e. when no "other photo" in the same album was found in o
as being earlier than a certain photo from f
. These are the first photos of each album you expect.
Notes
1 The LEFT JOIN
combines the tables a
and f
already joined (left) with table o
(right). The result set will contain all the combinations of rows produced by the INNER JOIN
of a
and f
, at least once (and not all the rows from f
as said above; I preferred to write it this way to make it simpler).
2 I assumed there is a field named photoid
in table TBLphotos
and it is the PK
of the table (it's important to have distinct values). If you change the sorting criterion to define the first photo from the table (for example, using the upload time) and the field you use does not have distinct values then on tie the query returns all the photos having the same smallest value for the chosen column. You can fix this easily by keeping/adding into the ON
clause the condition that uses photoid
(or other field that could solve the tie).
3 You can put whatever fields you want/need into the SELECT
clause but only from tables a
and f
. Nothing stops you to put fields from o
but all of them are NULL
. Read the entire explanation again if you didn't understand why.
4 If you want to understand how this query works then remove the WHERE
clause, run it using a MySQL client and analyze the result set.
5 You'll find a lot of answers to similar questions on SO tagged greatest-n-per-group. If they use GROUP BY
avoid them because they are wrong and not conformant with the SQL standard (most of them work by accident). If they use sub-queries (with or without GROUP BY
in the inner query) they are probably correct but most of the time they are slow. Sub-queries are difficult to optimize, and a lot of times it is not even possible. If they use MySQL variables then they are also slow. I don't know why MySQL variables are slow, I guess it's because the query planner cannot make the best use of indexes in this case.