group by behavior

Question

I need to find the rows with the greatest value in their group.

I did a lot of research and I found plenty of solutions: join, temporary table, etc.

Still I wanted to try something with "order by ... having ..." and I found something weird.

This is my query:

SELECT MAX(id) AS MaxValue, MyTable.* 
FROM MyTable
GROUP BY name
HAVING id = MaxValue

I get a list of rows as result but I don't have a row for every name.

I obtain many rows when I do this:

SELECT * FROM MyTable WHERE name = 'John'

But the result of the first query doesn't contain any row with the name 'John' and I don't get anything if I do this:

SELECT MAX(id) AS MaxValue, MyTable.* 
FROM MyTable
GROUP BY name
HAVING id = MaxValue AND name = 'John'

I don't really understand how it works and it triggers me.

Could anyone explain me this behavior?

I did't ask "how to do that". I just didn't understand how "group by ... having" works.

*"I need to find the rows with the greatest value in their group."* -- `GROUP BY` doesn't help for that. It doesn't return rows from the table, it **generates new rows** using the data from each group. It computes each expression from the `SELECT` clause independent of the others. As a general rule, `SELECT * ... GROUP BY` is invalid SQL. — axiac, May 10 '17 at 15:49
What do you mean by saying "invalid" about `SELECT * ... GROUP BY` ? I use mySQL and I don't get any errors. You mean it is bad practice? — Newbie, May 10 '17 at 15:56
Invalid means incorrect. The SQL standard sets some restrictions to the expressions that can appear in the `SELECT` clause of a `GROUP BY` query (for a good reason). Most RDBMSes follow the standard and reject invalid queries. Before version 5.7.5, MySQL accepts them but [takes its freedom to return indeterminate results](https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html). Starting with 5.7.5 the behaviour is configurable (by default it rejects them) and I guess in the future it won't accept them any more. Read more: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html — axiac, May 10 '17 at 16:02
I seee. And I imagine it is the same problem when I do this: `SELECT id, MAX(id) AS MaxValue, FROM MyTable GROUP BY name HAVING id = MaxValue` — Newbie, May 10 '17 at 16:12
Take a look at [this answer](http://stackoverflow.com/a/28090544/4265352). If you query a single table then you get the idea from that answer and write the query that matches your needs. It is the fastest way to do the job. If you need to query multiple tables (using `JOIN`s) then you better try one of the queries provided in the answers below. They won't run that fast but they will do the job without other issues. — axiac, May 10 '17 at 16:31

score 0 · Answer 1 · answered May 10 '17 at 15:45

0

if you want the rows for the max id group by name a way is

  select * from 
  MyTable where id in (SELECT MAX(id) 
        FROM MyTable
        GROUP BY name
   )

or if you need only the row for john

  select * from 
  MyTable where id in (SELECT MAX(id) 
        FROM MyTable
        GROUP BY name
   )
  where name = 'John';

answered May 10 '17 at 15:45

ScaisEdge

131,976
10
91
107

Thank you for your suggestion but I know them already. My problem is that I don't understand the behavior of the first query. Why do I have missing rows? – Newbie May 10 '17 at 15:50
that select format is driving me crazy :/ – Juan Carlos Oropeza May 10 '17 at 15:52
This assume `id` is a primary key or unique value in `MyTable` even when is a good assumption isnt guaranted. – Juan Carlos Oropeza May 10 '17 at 15:57
@JuanCarlosOropeza may be .. hope that the OP give us some suggestion for this .. .. – ScaisEdge May 10 '17 at 16:01

Juan Carlos Oropeza · Answer 2 · 2017-05-10T16:13:11.100

0

What you need to do is calculate the max id for each name. For that you use a simple GROUP BY.

SELECT name, MAX(id) as MaxValue
FROM YourTable
GROUP BY name

Now you perform a JOIN or one IN comparasion to see which row has the MaxValue. This have many ways to do it. You always should always check the EXPLAIN PLAN to see what is the optimizer doing and if are using the proper index.

I rather use the JOIN aproach. Where you try to match your table with the values already found in the previous query.

SELECT *
FROM YourTable YT
JOIN ( SELECT name, MAX(id) as MaxValue
       FROM YourTable
       GROUP BY name 
     ) MaxT
  ON YT.name = MaxT.name
 AND YT.id = MaxT.MaxValue

Now this can have a problem if ID isnt an unique value for each name, and can return duplicate rows so you need take aditional consideration if only want one result.

There is also the option of use a subquery for each row. Here you take the name from the outer Table and use it to find the Max(ID) for that name in the inner subquery. This method also can have the same issue with duplicates

 SELECT *
 FROM YourTable YT
 WHERE id = ( SELECT MAX(MaxT.id)
              FROM YourTable MaxT
              WHERE MaxT.name = YT.name )

edited May 10 '17 at 16:13

answered May 10 '17 at 16:06

Juan Carlos Oropeza

47,252
12
78
118

I found out already about this solution. Still it's good to have it here! – Newbie May 10 '17 at 16:24
I noticed that writing `SELECT name, MAX(id) as MaxValue FROM YourTable GROUP BY name` is ok. But this is not: `SELECT any_other_column, name, MAX(id) as MaxValue FROM YourTable GROUP BY name`. Is there a rule saying that a selected value can be only the field used for group or a specific function, when using "group by"? – Newbie May 10 '17 at 16:25
Yes there is a rule .... but in mysql you can ignore it (we dont like that btw). New version have a way to turn on / off that rule. But the rule is every field on the select must be on the `GROUP BY` or in a aggregated function like `MAX`, `COUNT` or `SUM` – Juan Carlos Oropeza May 10 '17 at 23:10
So do you still have a question? – Juan Carlos Oropeza May 10 '17 at 23:13
It is clearer now how GROUP BY works. I was triggered by this comportment and now I better understand it. Thank you! – Newbie May 17 '17 at 06:37

group by behavior

2 Answers2