ROW_NUMBER() over two columns, if, 2 rows are the same then

Question

I have data like this:

item_no p_no    date        RN()
35917   1220540 2000-04-03  1
35917   1220540 2000-04-18  1
35917   1220540 2001-02-12  1
35917   1220540 2001-03-08  1
35917   1220540 2001-03-19  1
542672  1243288 2000-01-24  1
564575  1243288 2000-01-24  2
549816  1243288 2000-01-24  3
542672  1243288 2000-02-25  1
564575  1243288 2000-02-25  2
549816  1243288 2000-02-25  3

I want to select one record for every p_no, with max date (the most recent record). If there are 2 rows with same p_no and date, i need to check item_no and select record with the highest. Example:

p_no= 1243288 has two dates- 2000-02-25 and 2000-01-24
date 2000-02-25 is GREATEST so records with date=2000-02-25 is what I'am looking
there are 3 records with same group (p_np, date) so i have to select record with largest item_no
record I am interested with :

564575 1243288 2000-02-25 2

I used ROW_NUMBER() to get column RN

row_number() over (partition by p_no, date order by date desc) rnk,

but i have no idea how to use this column and check item_no to select what i need. Is my approach is wrong? Any suggestions?

Just add `ITEM_NO desc` to the `order by` clause of the analytic function. Incidentally, if you're using 12c [check out this answer](https://stackoverflow.com/a/43028479/146325) — APC, Aug 08 '17 at 07:11

ssn · Answer 1 · 2017-08-08T07:54:41.110

Consider this example:

1 We replicate your table

 CREATE TABLE #tmp (
 item_no int
 ,p_no int
 ,[date] datetime
 )

2 Populate it with same data as you

INSERT INTO #tmp (item_no,p_no,[date]) VALUES (35917,1220540,'2000-04-03')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (35917,1220540,'2000-04-18')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (35917,1220540,'2001-02-12')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (35917,1220540,'2001-03-08')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (35917,1220540,'2001-03-19')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (542672,1243288,'2000-01-24')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (564575,1243288,'2000-01-24')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (549816,1243288,'2000-01-24')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (542672,1243288,'2000-02-25')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (564575,1243288,'2000-02-25')
INSERT INTO #tmp (item_no,p_no,[date]) VALUES (549816,1243288,'2000-02-25')

3 Create our select statement

SELECT * FROM (
SELECT
    T.item_no
   ,T.p_no
   ,max(T.[date]) AS [max_date]
   ,row_number() OVER (PARTITION BY T.p_no ORDER BY T.item_no desc) AS rn
FROM
    #tmp AS T
GROUP BY
    T.item_no
   ,T.p_no
) AS A
WHERE
    rn = 1

First we find the maximum date. Next we partition the set by p_no in the order of highest item_no and give those rows the value of 1. Then we select everything with a row_number = 1.

EDIT: Shorter/alternative solution:

SELECT * FROM (
SELECT
    T.item_no
   ,T.p_no
   ,T.[date]
   ,row_number() OVER (PARTITION BY T.p_no ORDER BY T.[date] desc, T.item_no desc) AS rn
FROM
    #tmp AS T
) AS A
WHERE
    rn = 1

this seems a very complicated way of achieving the goal; what does it offer over adding item no to the analytic function order clause ? — Caius Jard, Aug 08 '17 at 07:26

Caius Jard · Answer 2 · 2017-08-08T06:36:10.943

Add item_no to the ORDER BY part of your row number function so it becomes ORDER BY date desc, item_no desc). The idea is that it breaks a tie between same elements in the partition by telling which item is to come first if the date is the same

To make use of your rnk: Wrap your whole select statement inside another

SELECT * FROM ( <your select statement> ) WHERE rnk = 1

Example

SELECT * FROM (

  select employee.*, row_number() over(partition by department order by salary desc) as rnk

) WHERE rnk = 1 --get highest paid in each dept

ROW_NUMBER() over two columns, if, 2 rows are the same then

2 Answers2