Why can you only select two columns when using MAX() in an INNER JOIN?

Question

The answer to this question says a row with a max value on a column can be selected like so:

select yt.id, yt.rev, yt.contents
from YourTable yt
inner join(
    select id, max(rev) rev
    from YourTable
    group by id
) ss on yt.id = ss.id and yt.rev = ss.rev

The answer to this question provides a similar solution:

SELECT tt.*
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime

The two solutions are essentially the same, just with different table and column names. They both return two columns in the INNER JOIN, with one of them being the column MAX() is being used on.

I know there are a bazillion greatest-n-per-group questions on here, but what I'm wondering is why only two columns can be returned when using MAX() in an INNER JOIN?

If all of the columns could be returned, then there wouldn't be a need for the INNER JOIN (right?), so there must be a reason.

One last note: In the first query, the first column being returned appears to be a primary key. So my initial thought was maybe you can only return the primary key along with the column MAX() is being used on. However, in the second query the first column being returned is home and it isn't a primary key. So that blows my theory.

You can return as many aggregated values as you like -- `MIN()`s, `MAX()`s, `AVG()`s, `SUM()`s and more! I don't understand the question. — Gordon Linoff, Dec 18 '14 at 21:35
there's no limits on how many columns you can have in that subquery, but there's no point in selecting columns you'd never use in the parent query. — Marc B, Dec 18 '14 at 21:36
The reason is that you have to either put value in the group claus or use aggregate function — vittore, Dec 18 '14 at 21:36
@GordonLinoff If there's no limit to the number of columns you can return, then why is the subquery necessary? Why not just use max in the SELECT statement? — Nate, Dec 18 '14 at 21:37
@MarcB et al - For example, from what you're saying, why not just do `select MAX(yt.id), yt.rev, yt.contents from YourTable yt`? — Nate, Dec 18 '14 at 21:39
@Nate . . . Because `MAX()` only applies to the arguments of the function. It has no affect on other values. If you want to get the *rows* that have the `MAX()` value, then an aggregation query is not sufficient. — Gordon Linoff, Dec 18 '14 at 21:40
@nate: because if you mix aggregated fields (max()) and non-aggregated fields, then you CAN get undefined results if you don't set up your `group by` properly. mysql will do its best for you, but it can guess wrong. — Marc B, Dec 18 '14 at 21:40
@GordonLinoff Right, so then why can you return other columns from the `INNER JOIN`? If the other columns wouldn't correspond to the row with the `MAX()` value in the `SELECT` statement, then why do they in the `INNER JOIN`? Doesn't make sense.. — Nate, Dec 18 '14 at 21:41

JNevill · Accepted Answer · 2014-12-19T01:10:06.963

1

select yt.id, yt.rev, yt.contents
from YourTable yt
inner join(
    select id, max(rev) rev
    from YourTable
    group by id
) ss on yt.id = ss.id and yt.rev = ss.rev

AND

select yt.id, max(yt.rev), yt.contents
from YourTable yt    
group by id, contents

Will yield different answers. This is why they aren't the same...

That second statement will give you back the distinct id/contents combinations, and if there are records that share the same id/contents combinations, the max date of those shared records. Instead... the first one returns, in the subquery, each distinct ID and the maximum rev found in the table for that ID. THEN it returns the ID, that max rev, and only the contents where the max rev/ID exists.

Here's a SQLFIDDLE showing this example

edited Dec 19 '14 at 01:10

answered Dec 18 '14 at 21:52

JNevill

46,980
4
38
63

`the first one returns, in the subquery, a single distinct ID and the maximum rev found in the table`. So why is `yt.id = ss.id` in the `ON` clause if it's unrelated to `ss.rev`? – Nate Dec 18 '14 at 22:42
No it does not give back a single distinct id. It returns back every distinct ID and the highest rev associated with each distinct id. That is why both fields are included in the ON. – JNevill Dec 19 '14 at 00:48
`It returns back every distinct ID **and the highest rev associated with each distinct id**` -- now I'm super confused. This contradicts your answer. Doesn't it return the highest rev in the **table**, not `the highest rev associated with each distinct id`? – Nate Dec 19 '14 at 00:54
Yep. My fault. That was terribly worded. I have fixed it. – JNevill Dec 19 '14 at 01:01
I added a sqlfiddle that should help explain it since my words are failing me today. http://sqlfiddle.com/#!2/d0e9b/1 (also included in my answer) – JNevill Dec 19 '14 at 01:10
Thanks, I think I understand how the query works now. One of the things that was confusing me was that my `id` is autoincremented and is unique, so the query in the join seemed unnecessary to me, but I now realize the query is written in such a way that it will work on data where `id` is not unique. – Nate Dec 19 '14 at 17:33

Why can you only select two columns when using MAX() in an INNER JOIN?

1 Answers1