-1

Given a TASK_ID, what is the most efficient way to retrieve the latest version of each file?

Here is an example table I am working with.

ID    TASK_ID    FILENAME    VERSION_NUMBER    RANDOM_COLUMN
1     1          a.txt       1                 112
2     1          b.txt       1                 231231
3     1          c.txt       1                 234356
4     1          a.txt       2                 35
5     1          b.txt       2                 84
6     1          b.txt       3                 97604
7     2          aa.txt      1                 6956
8     2          bb.txt      1                 9760054

This is what I have so far.

SELECT FILENAME, TASK_ID, MAX(VERSION_NUMBER)
FROM TABLE_NAME
GROUP BY FILENAME, TASK_ID
HAVING TASK_ID = '1';

Unfortunately, the above only gives me the 3 columns I have in the SELECT statement instead of the entire row and adding more columns will give me an error.

FILENAME    TASK_ID    MAX(VERSION_NUMBER)
a.txt       1          2
b.txt       1          3
c.txt       1          1

How might I go about getting the entire row?

I plan on adding this SQL query to a Query annotation within a Spring Repository and run it as a native query. Is that the best way?

EDIT: I ended up with this SQL query, which gets what I want, but I am not sure whether this is correct or how efficient it is.

SELECT ta.*
FROM TABLE_NAME ta
INNER JOIN (SELECT FILENAME, TASK_ID, MAX(VERSION_NUMBER) vn
  FROM TABLE_NAME
  GROUP BY FILENAME, TASK_ID
  HAVING TASK_ID = '1') sq
ON ta.FILENAME = sq.FILENAME
AND ta.TASK_ID = sq.TASK_ID
AND ta.VERSION_NUMBER = sq.vn;

Result of above SQL query

ID    TASK_ID    FILENAME    VERSION_NUMBER    RANDOM_COLUMN
3     1          c.txt       1                 234356
4     1          a.txt       2                 35
6     1          b.txt       3                 97604
dalawh
  • 886
  • 8
  • 15
  • 37

2 Answers2

0

Use row_number():

SELECT ta.*
FROM (SELECT ta.*,
             ROW_NUMBER() OVER (PARTITION BY FILENAME, TASK_ID ORDER BY VERSION_NUMBER DESC) as seqnum
      FROM TASK_ARTIFACT ta
     ) ta
WHERE seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The result of your query includes the seqnum in the results. Is there an alternative way to not include it besides changing the SELECT statement from ta.* to ta.ID, ta.TASK_ID, ta.FILENAME, etc? – dalawh Jun 23 '17 at 14:11
  • @dalawh . . . Select only the columns that you want. I use `select *` as a convenience when answering questions. – Gordon Linoff Jun 24 '17 at 02:09
0

Yes it's correct and don't see anything wrong with your current approach (edited one) .. I would do the same. Except that, the condition in HAVING should go to WHERE like

SELECT ta.*
FROM TABLE_NAME ta
INNER JOIN (SELECT FILENAME, TASK_ID, MAX(VERSION_NUMBER) vn
  FROM TABLE_NAME
  WHERE TASK_ID = '1'
  GROUP BY FILENAME, TASK_ID
  ) sq
ON ta.FILENAME = sq.FILENAME
AND ta.TASK_ID = sq.TASK_ID
AND ta.VERSION_NUMBER = sq.vn;
Rahul
  • 76,197
  • 13
  • 71
  • 125