Grouping by Column with Dependence on another Column

Question

Here is a simplified look at the problem I am trying to cleanly solve via a MySQL query. This is not the actual table I am dealing with.

If I have the following table:

Name Buyer ID  
John Fred  4  
John Smith 3  
Fred Sally 2  
John Kelly 1

I would like a query to return the following:

Name Buyer ID      
John Fred  4  
Fred Sally 2

Such that we group by 'name' and show the latest row / buyer / ID.

I tried to implement this by performing a nested select statement, wherein I first performed "ORDER BY ID DESC" then, on the outermost SELECT, "GROUP BY NAME". And, while this is a roundabout way of solving the problem, it seemed that, by virtue of the ordering, the correct selection would be returned to me. Unfortunately, "GROUP BY" does not 'guarantee' that the 'Buyer' column will contain the expected entry.

Any helpful suggests for implementing this as a query? At the moment, I have a highly-inefficient PHP 'version' of the query running on a large table dump - definitely not the best choice.

`+1` for providing sample data and expected result. – John Woo Nov 16 '12 at 03:48 — John Woo, Nov 16 '12 at 03:48

score 4 · Accepted Answer · answered Nov 16 '12 at 03:47

4

Try this one, the idea behind the subquery is that it gets the latest ID for each Name using MAX (aggregate function). Then join it against the table itself on the two columns of the subquery.

SELECT  a.*
FROM    tableName a
        INNER JOIN 
        (
            SELECT name, MAX(ID) maxID
            FROM tableName
            GROUP BY name
        ) b ON a.Name = b.Name AND
                a.ID = b.MaxID

SQLFiddle Demo

answered Nov 16 '12 at 03:47

John Woo

258,903
69
498
492

I ended up fixing the problem with my query (similar to duellsy's example), but I will have to try this out. Would this perform better than a subquery? – SDM Nov 16 '12 at 07:15
The point of this query is that it runs on all rdbms. Try running your accepted if it works on MSSQL. Indexing your field will have also a great impact on the performance of the query. :D – John Woo Nov 16 '12 at 07:19
Ah, I found an interesting difference between the two - http://sqlfiddle.com/#!2/1d483/2 vs http://sqlfiddle.com/#!2/1d483/1 - the way they handle duplicates appears to be difference. Shouldn't the 'GROUP BY' in your query prevent that from happening? – SDM Nov 16 '12 at 08:11
`DISTINCT` can solved that http://sqlfiddle.com/#!2/1d483/3 Anyway it's up to you :D You didn;t mentioned that there are possibilities of duplicated rows on your table. – John Woo Nov 16 '12 at 08:13
Ah, I just thought that 'GROUP BY' would automatically stop duplicates from occurring, as it would group each 'name', so that only one row will exist. I guess I will have to do some more reading! :-P – SDM Nov 16 '12 at 08:18
On further inspection, this is absolutely the best way to solve this problem. Making assumptions on grouping and ordering can result in queries that are no bueno - John's version is very explicit, so it doesn't have that problem. – SDM Nov 16 '12 at 17:28
do you see the difference now? :D – John Woo Nov 16 '12 at 17:29

score 1 · Answer 2 · answered Nov 16 '12 at 03:53

1

Another alternative is to load the data sorted in a subquery, then group on the results. I can't cite this, but I've read in a few places there's no (discernable) performance hit on this.

So something like:

SELECT * 
FROM (
    SELECT * 
    FROM `yourtable` 
    ORDER BY `id` DESC
) as `tmp` 
GROUP BY `name`

answered Nov 16 '12 at 03:53

duellsy

8,497
2
36
60

FYI Found one reference about the performance of subqueries http://stackoverflow.com/a/356699/1613391 – duellsy Nov 16 '12 at 03:54
This is what I was using - it turns out I had an issue with one of my columns that resulted in an improper sort. I am glad that there is not a significant performance hit. :) – SDM Nov 16 '12 at 06:56

Grouping by Column with Dependence on another Column

2 Answers2

Linked