I have a table similar to the following.
|name | grp | dt
------------------------------
|foo | A | 2016-01-01
|bar | A | 2016-01-02
|hai | B | 2016-01-01
|bai | B | 2016-01-02
|baz | C | 2016-01-01
For each group, I want to find the name
whose dt
is the most recent. In other words, MAX(dt), GROUP by grp, and associate the name whose dt is the max of the group to the output:
|name | grp | dt
------------------------------
|bar | A | 2016-01-02
|bai | B | 2016-01-02
|baz | C | 2016-01-01
In Oracle, the following query works and is very clean (taken from here):
SELECT o.name, o.grp, o.dt
FROM tab o
LEFT JOIN tab b
ON o.grp = b.grp AND o.dt < b.dt
WHERE b.dt IS NULL
However this fails with [Error 10017]: Line 4:43 Both left and right aliases encountered in JOIN 'service_effective_from'
From another question quoting the documentation, I learn that I cannot use an inequality operator in a join statement:
Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.
What is a clean solution for obtaining this in Hive, given that I cannot use an inequality operator in a join condition?