0

If I have a table like this

fid name   date
---------------------
1   John1  2020-10-08
1   John2  2020-10-08
1   John3  2018-06-04
2   Tom1   2019-10-08

I want to preserve the row for each fid that has the most recent date. However, if there are multiple, keep only 1 (any of them is fine). So the final result should end up like

fid name   date
---------------------
1   John1  2020-10-08
2   Tom1   2019-10-08

Does anyone know how to do this in SQL Server? I use v14 (2017) if that matters.

Problem is if I group by fid and do Max(date), I get 1 record per fid, but then when I left join on it to get the other columns it gives me back 2 records since the most recent date shows up twice.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
omega
  • 40,311
  • 81
  • 251
  • 474
  • First it sounds like you want to remove rows from your table ("remove duplicates"), then it sounds like you merely want to select rows without duplicates ("to get the other columns"). Which do you want? `DELETE` or `SELECT`? – Thorsten Kettner Oct 22 '21 at 06:53
  • Not delete original data, just get a query that doesn't include the ones I want gone. So I want to select. – omega Oct 22 '21 at 13:51

1 Answers1

6

We can use a deletable CTE along with ROW_NUMBER here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY fid ORDER BY date DESC, name) rn
    FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

The above logic will assign rn = 1 (i.e. spare) the record with the most recent date, per group of fid records. Should two records with the same fid also have the same latest date, then it spares the earlier name.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360