3

Same old question:

I have a table. I want the rows with max value for a field, grouped by a set of fields. BUT I do not need JUST the max value and group by fields, I need the full row (all other fields, so I can't just do:

select max(field1),  field2 from table group by field2; 

As I said, I need field3, field4, ... , fieldn.

This has been treated before, and I found an excelent answer here:

SQL Select only rows with Max Value on a Column

Basically, it says I should do a join between the original table, getting all fields from there, and a subquery that gets me the max and group by fields. And for most cases, that probably works.

My issue

For my particular case I am not so sure that approach works, because of the data types involved. Let me give an example.

Consider this gives me the max values I want, grouped by the field I want:

select max(field1) maxf1, field2 f2 from mytable group by f2

The above link would suggest trying something like:

SELECT mt.field1, mt.field2, mt.field3, ... , mt.fieldn
FROM mytable mt
INNER JOIN (
    SELECT max(field1) maxf1, field2 f2 FROM mytable GROUP BY f2
) sq
ON sq.maxf1 = mt.field1 AND sq.f2 = mt.field2

For my particular case, field1 and field2 may be of type FLOAT and TIMESTAMP. Therefore, the join comparisons sq.maxf1 = mt.field1 and sq.f2 = mt.field2 might not be the way to go (see What is the most effective way for float and double comparison?). So, any hints for me on this one?

Community
  • 1
  • 1
user2812541
  • 31
  • 1
  • 1
  • 2
  • 1
    I don't see the problem. There's no roundoff problem when you use `MAX()`. – Barmar Sep 24 '13 at 19:57
  • You only need to worry about floating point comparisons when you're comparing the results of calculations that might round slightly differently, or accumulate errors. This isn't the case when you're comparing a field in a table against itself. – Barmar Sep 24 '13 at 19:58

1 Answers1

3

Your query is fine:

SELECT mt.field1, mt.field2, mt.field3, ... , mt.fieldn
FROM mytable mt
INNER JOIN ( SELECT max(field1) maxf1, field2 f2 
             FROM mytable GROUP BY f2
           ) sq
   ON sq.maxf1 = mt.field1 
     AND sq.f2 = mt.field2

MAX() or MIN() will not result in float rounding issues, if you were aggregating or performing other calculations that would be a concern, but then you wouldn't be using the field to JOIN anyway.

FYI: If you were doing some other conversion or trying to JOIN using a FLOAT on one side and DECIMAL on the other a common method is choosing an acceptable margin of error and using absolute value and subtraction:

SELECT mt.field1, mt.field2, mt.field3, ... , mt.fieldn
FROM mytable mt
INNER JOIN ( SELECT max(field1) maxf1, field2 f2 
             FROM mytable GROUP BY f2
           ) sq
   ON ABS(sq.maxf1 - mt.field1) < .0001
     AND sq.f2 = mt.field2
Hart CO
  • 34,064
  • 6
  • 48
  • 63
  • Indeed the data type FLOAT itself has rounding issues so functions run that on the data with type FLOAT like MAX or MIN should also has problems see http://dev.mysql.com/doc/refman/5.0/en/problems-with-float.html... – Raymond Nijland Sep 24 '13 at 20:45
  • @RaymondNijland indeed it mentions that the internally stored value may not be the same as the value viewed, but it does not say that the internally stored value is not consistent. ie: I shouldn't count on comparing a decimal with a seemingly equivalent float, but I can count on FLOAT = FLOAT when only taking MAX()/MIN() and not doing any conversions. – Hart CO Sep 24 '13 at 20:56
  • true you are right about the MIN and MAX like the manual says you should lookout with functions like SUM and ABS or if you do an conversion. +1 for the ON ABS(sq.maxf1 - mt.field1) < .0001 edit – Raymond Nijland Sep 24 '13 at 21:14
  • Side Note you may also want to enlarge the tmp_table_size and max_heap_table_size settings if indexes dont work for this query if the settings these are to low MySQL will create an disk based myisam table what will kill MySQL performance. and also GROUP BY will trigger to Using temporary if you dont properly index and maybe even the DERIVED (nested select inside the inner join) could trigger an temporary table. see the documentatie about this http://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html – Raymond Nijland Sep 24 '13 at 21:27