Issue with DISTINCT in my mysql query

Question

Hello i am trying to make a website where i want a photo album.

Thing is that this site will have 2 tables, 1 table that stores name of photo album (TBLalbum) and a unique id for that album (id,albumname). The second table (TBLphotos) stores information about uploaded photos and to what album each photo belongs(albumid,photoname and some more columns). Now i want to list all album names which is not so hard to do, but also in same sql query i want it to show the first photo in each album.

I tried this:

SELECT DISTINCT albumid 
FROM TBLphotos INNER JOIN TBLalbum 
ON TBLalbum.id=TBLphotos.albumid

this does not give me the possibility to print out the column photoname

and if i include it like :

SELECT DISTINCT albumid,photoname 
FROM TBLphotos INNER JOIN TBLalbum 
ON TBLalbum.id=TBLphotos.albumid

then it doesn't get distinct on just albumid.

A bit hard for me to explain, but can anyone tell me how to solve this?

I tried this to:"SELECT DISTINCT TBLalbum.id,TBLalbum.namn FROM TBLalbum LEFT JOIN TBLphotos ON TBLphotos.albumid=TBLalbum.id" and this works but i allso want to print the column photoname from TBLphotos without making it repeat the album name. How? — Oskar Oskarsson, Mar 18 '15 at 16:39
Possible duplicate, see here http://stackoverflow.com/questions/2043259/sql-server-how-to-join-to-first-row — nobody, Mar 18 '15 at 18:06
What do you want it to output in the `TBLalbum.id` column when there is more than one `TBLphotos.photoname`? If you just want it to show the album's id once and then any repeated rows would be null, then that is not something that you do in a database. That is something you will need to do in your presentation end like PHP, or wherever you are consuming the resultset. — JNevill, Mar 18 '15 at 18:06
There is no such thing like `DISTINCT` on a single column. What values do you expect to get on the other columns? — axiac, Mar 18 '15 at 18:10

Paul Griffin · Answer 1 · 2015-03-18T18:49:23.607

1

Try:

SELECT
  a.albumid,
  a.albumname,
  (SELECT photoname FROM TBLphotos p WHERE p.albumid = a.id LIMIT 1) AS photoname
FROM TBLalbum a

The subquery limits the result set for each album to just one record with the LIMIT clause. If you want a specific record from TBLphotos, you can amend the subquery to put things in a certain order or do whatever you need to do.

Note that DISTINCT is for filtering out duplicate rows, which is not (I think) what you are really trying to do here.

edited Mar 18 '15 at 18:49

answered Mar 18 '15 at 18:03

Paul Griffin

2,416
15
19

Hello and thanks for answering, i started myself to belive i needed a subquery, sounds like this is the way doing this, but i still not had this to work, i can imagine i need replace "TOP 1" with "LIMIT 1" though i use mysql but there is still something that makes it not work.. i will try figure out, but if you can do it faster i am glad :) – Oskar Oskarsson Mar 18 '15 at 18:46
@OskarOskarsson Oops, I usually work in TSQL, forgot to convert that to MySQL for you. Just a sec. – Paul Griffin Mar 18 '15 at 18:48
Thanks allot! now it works perfect like i want it. I really need try to understand the sql language better, where, why and how to use alias and so on. But this helped me for this time. Thanks once again! – Oskar Oskarsson Mar 18 '15 at 19:16
@OskarOskarsson you need to use aliases for tables when you use the same table two or more times in a query (to solve ambiguities). Other than that, I frequently use one or two-letter aliases to have shorter names for the tables involved in the query. When the query uses a field name that appears in two or more tables you need to specify the owning table and a two-letter alias is shorter and easier to read. – axiac Mar 19 '15 at 07:22
Ok thanks for information, i have not had to use 2 tables at same query before so thanks again. – Oskar Oskarsson Mar 23 '15 at 00:11

score 1 · Answer 2 · answered Mar 18 '15 at 18:48

The correct query for your request is:

SELECT a.albumid, f.photoname 
FROM TBLalbum a
    INNER JOIN TBLphotos f          # "f" from "first" photo
      ON f.albumid = a.id           #          of this album
    LEFT JOIN TBLphotos o           # "o" from "other" photo
      ON o.albumid = f.albumid      #          of the same album
      AND o.photoid < f.photoid     # "o" is before "f"
WHERE o.photoid IS NULL             # no such "o" exists

Explanation

This query joins table TBLalbum (aliased as a from "album") with TBLphotos (aliased as f from "first photo of this album"). Because of the INNER JOIN, the albums without photos will not be included. Use LEFT JOIN instead if you want them included.

Next, it joins the table TBLalbum again (aliased as o from "other photo"). The first join condition (o.albumid = f.albumid) ensures the photos selected from o are in the same album as their matches from f. The second condition (o.photoid < f.photoid) pairs a row from f¹ only with the rows from o having a smaller photoid (i.e. created earlier). Replace this with your own definition of earlier in order to get the first photo as expected.

Because the second JOIN is a LEFT JOIN, it will add to the result set all the rows from f¹, including those that does not have any match in o because there is no photo in o that is earlier than the photo currently selected from f. For these rows, a row full of NULLs is used instead of the missing row from o.

The WHERE clause filters from the result set produced by the joins only the pairs (a, f, o) that have NULL values in o.photoid; i.e. when no "other photo" in the same album was found in o as being earlier than a certain photo from f. These are the first photos of each album you expect.

Notes

¹ The LEFT JOIN combines the tables a and f already joined (left) with table o (right). The result set will contain all the combinations of rows produced by the INNER JOIN of a and f, at least once (and not all the rows from f as said above; I preferred to write it this way to make it simpler).

² I assumed there is a field named photoid in table TBLphotos and it is the PK of the table (it's important to have distinct values). If you change the sorting criterion to define the first photo from the table (for example, using the upload time) and the field you use does not have distinct values then on tie the query returns all the photos having the same smallest value for the chosen column. You can fix this easily by keeping/adding into the ON clause the condition that uses photoid (or other field that could solve the tie).

³ You can put whatever fields you want/need into the SELECT clause but only from tables a and f. Nothing stops you to put fields from o but all of them are NULL. Read the entire explanation again if you didn't understand why.

⁴ If you want to understand how this query works then remove the WHERE clause, run it using a MySQL client and analyze the result set.

⁵ You'll find a lot of answers to similar questions on SO tagged greatest-n-per-group. If they use GROUP BY avoid them because they are wrong and not conformant with the SQL standard (most of them work by accident). If they use sub-queries (with or without GROUP BY in the inner query) they are probably correct but most of the time they are slow. Sub-queries are difficult to optimize, and a lot of times it is not even possible. If they use MySQL variables then they are also slow. I don't know why MySQL variables are slow, I guess it's because the query planner cannot make the best use of indexes in this case.

Thanks for taking allot of time writing this and to help, Paul Griffin made it work and i am sure allso your example works good, i am happy for both of your fast response and i am impressed. Thanks allot! — Oskar Oskarsson, Mar 18 '15 at 19:20
@OskarOskarsson I would commend this answer to you. I am not 100% convinced that it is faster than mine, but it is far more flexible in how you can shape your data. — Paul Griffin, Mar 18 '15 at 20:36
@PaulGriffin I think your query can be optimized by MySQL query optimizer. Its processing is very similar with the one of an `INNER JOIN` that has the `LIMIT` applied earlier. — axiac, Mar 19 '15 at 07:27
Ok thanks again, i will try both of your querys i just had a litte hurry to get it work and get online, but now i want to take time and try understand both this querys and what they do. Thanks again to you both. — Oskar Oskarsson, Mar 23 '15 at 00:14

Issue with DISTINCT in my mysql query

2 Answers2

Explanation

Notes