How do I join tables to choose the maximum row in the joined table?

Question

I have two tables and I would like to left join them on an attribute, but rather than taking all matching rows in the second table I would like to choose only the entry with the highest number (latest date, etc.) in a certain column. How do I express this result in SQL?

Here's a simplified example to clarify my question.

Table `colors`
| color |
+-------+
| red   |
| green |
| blue  |


Table `inventory`
| color | value | shape    |
+-------+-------+----------|
| red   | 1     | square   |
| red   | 2     | circle   |
| green | 7     | triangle |


Desired output:
| color | value | shape    |
+-------+-------+----------|
| red   | 2     | circle   |
| green | 7     | triangle |
| blue  | NULL  | NULL     |

My tables are fairly large so ideally a solution would be reasonably efficient. (There's no need for fine tuning, just trying to avoid double joins which could become huge.)

score 2 · Answer 1 · answered Jan 19 '18 at 21:34

2

select c.color, i2.value, i2.shape
from colors c
left join 
(
   select color, max(value) as value
   from inventory 
   group by color
) i on c.color = i.color
left join inventory i2 on i2.color = i.color
                      and i2.value = i.value

answered Jan 19 '18 at 21:34

juergen d

201,996
37
293
362

This works, but is there a way to avoid a double join? – Charles Jan 19 '18 at 22:04
@Charles A double join likely won't be a performance problem. `color` will be indexed as a foreign and primary key so both joins will be very fast. Indexing `value` might speed up the subselect. – Schwern Jan 19 '18 at 22:40
1

@Charles To improve performance you could replace the subselect with a [materialized view](https://en.wikipedia.org/wiki/Materialized_view), basically a cache of the subquery that looks like a table. It also makes queries easier to read, rather than joining on a subquery you'd join on the name of the view, perhaps `best_colors`, like any other table. But MySQL doesn't support materialized views. However, you can create your own [summary tables](http://mysql.rjweb.org/doc.php/summarytables) and keep them updated in various ways depending on the nature of the data. – Schwern Jan 19 '18 at 22:48

Alex · Accepted Answer · 2018-01-19T22:07:15.003

2

http://sqlfiddle.com/#!9/0b75c/6

SELECT c.*, i.value, i.shape
FROM colors c
LEFT JOIN inventory i
ON c.color = i.color
LEFT JOIN inventory i_
ON i.color = i_.color
  AND i.value<i_.value
WHERE  i_.color IS NULL

http://sqlfiddle.com/#!9/0b75c/8

SELECT i.value, i.shape
FROM inventory i
LEFT JOIN inventory i_
ON i.color = i_.color
  AND i.value<i_.value
WHERE  i_.color IS NULL

edited Jan 19 '18 at 22:07

answered Jan 19 '18 at 21:34

Alex

16,739
1
28
51

Yes, this works. But is there a way to avoid double joins? – Charles Jan 19 '18 at 21:44
If you don't mind to lose `blue` in result. In that case you can have 1 self join. – Alex Jan 19 '18 at 21:48
I could live with dropping blue (I can always `UNION` it back) but I'd like to avoid joining inventory to itself. With 100k entries the intermediate table would have 10 billion entries... – Charles Jan 19 '18 at 22:03
this joins should work much faster than union – Alex Jan 19 '18 at 22:06
1

@Charles Relational databases are very good at joins, that's the "relational" part. If `color` and `value` are indexed (color should be as a key), it will do something much more efficient than brute force 10 billion operations. – Schwern Jan 19 '18 at 22:42
@Charles yes, you can do [single joins](https://stackoverflow.com/a/7745635/1057527) – machineaddict Jan 23 '18 at 12:44

How do I join tables to choose the maximum row in the joined table?

2 Answers2