0

I have two tables and I would like to left join them on an attribute, but rather than taking all matching rows in the second table I would like to choose only the entry with the highest number (latest date, etc.) in a certain column. How do I express this result in SQL?

Here's a simplified example to clarify my question.

Table `colors`
| color |
+-------+
| red   |
| green |
| blue  |


Table `inventory`
| color | value | shape    |
+-------+-------+----------|
| red   | 1     | square   |
| red   | 2     | circle   |
| green | 7     | triangle |


Desired output:
| color | value | shape    |
+-------+-------+----------|
| red   | 2     | circle   |
| green | 7     | triangle |
| blue  | NULL  | NULL     |

My tables are fairly large so ideally a solution would be reasonably efficient. (There's no need for fine tuning, just trying to avoid double joins which could become huge.)

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
Charles
  • 11,269
  • 13
  • 67
  • 105

2 Answers2

2
select c.color, i2.value, i2.shape
from colors c
left join 
(
   select color, max(value) as value
   from inventory 
   group by color
) i on c.color = i.color
left join inventory i2 on i2.color = i.color
                      and i2.value = i.value
juergen d
  • 201,996
  • 37
  • 293
  • 362
  • This works, but is there a way to avoid a double join? – Charles Jan 19 '18 at 22:04
  • @Charles A double join likely won't be a performance problem. `color` will be indexed as a foreign and primary key so both joins will be very fast. Indexing `value` might speed up the subselect. – Schwern Jan 19 '18 at 22:40
  • 1
    @Charles To improve performance you could replace the subselect with a [materialized view](https://en.wikipedia.org/wiki/Materialized_view), basically a cache of the subquery that looks like a table. It also makes queries easier to read, rather than joining on a subquery you'd join on the name of the view, perhaps `best_colors`, like any other table. But MySQL doesn't support materialized views. However, you can create your own [summary tables](http://mysql.rjweb.org/doc.php/summarytables) and keep them updated in various ways depending on the nature of the data. – Schwern Jan 19 '18 at 22:48
2

http://sqlfiddle.com/#!9/0b75c/6

SELECT c.*, i.value, i.shape
FROM colors c
LEFT JOIN inventory i
ON c.color = i.color
LEFT JOIN inventory i_
ON i.color = i_.color
  AND i.value<i_.value
WHERE  i_.color IS NULL

http://sqlfiddle.com/#!9/0b75c/8

SELECT i.value, i.shape
FROM inventory i
LEFT JOIN inventory i_
ON i.color = i_.color
  AND i.value<i_.value
WHERE  i_.color IS NULL
Alex
  • 16,739
  • 1
  • 28
  • 51
  • Yes, this works. But is there a way to avoid double joins? – Charles Jan 19 '18 at 21:44
  • If you don't mind to lose `blue` in result. In that case you can have 1 self join. – Alex Jan 19 '18 at 21:48
  • I could live with dropping blue (I can always `UNION` it back) but I'd like to avoid joining inventory to itself. With 100k entries the intermediate table would have 10 billion entries... – Charles Jan 19 '18 at 22:03
  • this joins should work much faster than union – Alex Jan 19 '18 at 22:06
  • 1
    @Charles Relational databases are very good at joins, that's the "relational" part. If `color` and `value` are indexed (color should be as a key), it will do something much more efficient than brute force 10 billion operations. – Schwern Jan 19 '18 at 22:42
  • @Charles yes, you can do [single joins](https://stackoverflow.com/a/7745635/1057527) – machineaddict Jan 23 '18 at 12:44