4

I have a table of metadata for updates to a software package. The table has columns id, name, version. I want to select all rows where the name is one of some given list of names and the version is maximum of all the rows with that name.

For example, given these records:

+----+------+---------+
| id | name | version |
+----+------+---------+
| 1  | foo  | 1       |
| 2  | foo  | 2       |
| 3  | bar  | 4       |
| 4  | bar  | 5       |
+----+------+---------+

And a task "give me the highest versions of records "foo" and "bar", I want the result to be:

+----+------+---------+
| id | name | version |
+----+------+---------+
| 2  | foo  | 2       |
| 4  | bar  | 5       |
+----+------+---------+

What I come up with so far, is using nested queries:

SELECT * 
  FROM updates 
  WHERE (
    id IN (SELECT id 
             FROM updates 
             WHERE name = 'foo' 
             ORDER BY version DESC 
             LIMIT 1)
  ) OR (
    id IN (SELECT id 
             FROM updates 
             WHERE name = 'bar' 
             ORDER BY version DESC 
             LIMIT 1)
  );

This works, but feels wrong. If I want to filter on more names, I have to replicate the whole subquery multiple times. Is there a better way to do this?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
adam
  • 791
  • 6
  • 21
  • My go-to answer for the case (as has been clarified): http://stackoverflow.com/questions/3800551/sql-select-first-row-in-each-group-by-group/7630564#7630564 – Erwin Brandstetter Apr 20 '13 at 12:50

3 Answers3

5
select distinct on (name) id, name, version
from metadata
where name in ('foo', 'bar')
order by name, version desc
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • I am a big fan of `DISTINCT ON`, but it's not the right tool when looking for `all rows` - which can include duplicate names. More importantly: this retrieves one row (with with the maximum version) for *every* `name` passing the `WHERE` clause, which is subtly different from what the question asks for, AIUI. – Erwin Brandstetter Apr 20 '13 at 12:11
  • This seems to do what I want and looks the simplest of all the answers so far, so I'm accepting it. Thanks! – adam Apr 20 '13 at 12:15
  • 1
    @adam: Upon re-reading I see that I seem to have misunderstood your question. – Erwin Brandstetter Apr 20 '13 at 12:16
  • @erwin: could be that I haven't phrased my question correctly. I want, for each name, the row with the maximum version of all the rows with that name. For each name, I want only one row. Doesn't this answer do that? – adam Apr 20 '13 at 12:20
  • To the OP: What if there are *two* (or more) records for a name, each with the *same* maximal value for version? (this is the *ties* issue in my discussion with Erwin) – wildplasser Apr 20 '13 at 12:24
  • For one, I missed `with that name` in `the version is maximum of all the rows with that name`. This was just a misunderstanding on my side. The other issue is with *ties*, like @wildplasser just pointed out. It is not clear from the question that `(name, version)` would be defined unique. In this case you would need to clarify how to break ties or retrieve all qualifying peers with a method like I have in my updated answer. – Erwin Brandstetter Apr 20 '13 at 12:29
  • 1
    In my case, `(name, version)` is always unique. I should have put this into the original question. – adam Apr 20 '13 at 12:35
  • @adam: Like 9 out of 10 people forget about that. – Erwin Brandstetter Apr 20 '13 at 12:36
3

NOT EXISTS is a way to avoid unwanted sub optimal tuples:

SELECT * 
FROM updates uu
WHERE uu.zname IN ('foo', 'bar')
AND NOT EXISTS (
    SELECT *
    FROM updates nx
    WHERE nx.zname = uu.zanme
    AND nx.version > uu.version
    );

Note: I replaced name by zname, since it is more or less a keyword in postgresql.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • `name` is [*not* a reserved word at all](http://www.postgresql.org/docs/current/interactive/sql-keywords-appendix.html), but it's still bad practice to use it as identifier, because its not descriptive. – Erwin Brandstetter Apr 20 '13 at 11:57
  • I thought it was used as a typename for table and column names in older versions. But maybe I just like to prefix everything with a `z`... – wildplasser Apr 20 '13 at 12:10
  • Also, and more importantly, I think the query is *incorrect* in the same way @Clodoaldo's query is. It gets maximum versions *per name* instead of rows with the *maximum of all rows*. – Erwin Brandstetter Apr 20 '13 at 12:15
  • The example (desired output) in the OQ does suggest that the OP wants the maximal version *per name*. Not sure, though (the OQ does has the `WHERE (subquery) OR (subquery)` , though) (s)he does not mention what to do in the case of ties. – wildplasser Apr 20 '13 at 12:19
  • You are right. I seem to have misread the question. There is only the matter of ties that is overlooked so often. – Erwin Brandstetter Apr 20 '13 at 12:20
  • Yeah, that's why you picked rank() instead of row_number(). I hesitated too, to add a tie breaker (would imply an ugly OR in the subquery condition ...) – wildplasser Apr 20 '13 at 12:22
  • Actrually, your query includes all peers and is subtly different from @Clodoaldo's version in that. Technically, yours (or mine) is correct. Although the OP may want @Clodoaldo's version without having said so in the Q. Most likely `(name, version)` is unique or should be to begin with. – Erwin Brandstetter Apr 20 '13 at 12:33
  • 1
    Yes, SQL is easy: *once you master the syntax, everything becomes a data modelling issue* – wildplasser Apr 20 '13 at 12:36
2

Update after rereading the Q:

I want to select all rows where the name is one of some given list of names and the version is maximum of all the rows with that name.

If there can be ties (multiple rows with the maximum version per name), you could use the window function rank() in a subquery. Requires PostgreSQL 8.4+.

SELECT *
FROM  (
   SELECT *, rank() OVER (PARTITION BY name ORDER BY version DESC) AS rnk
   FROM   updates 
   WHERE  name IN ('foo', 'bar')
   )
WHERE rnk = 1;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228