How to remove duplicates based on a certain column in SQL Server?

Question

If I have a table like this

fid name   date
---------------------
1   John1  2020-10-08
1   John2  2020-10-08
1   John3  2018-06-04
2   Tom1   2019-10-08

I want to preserve the row for each fid that has the most recent date. However, if there are multiple, keep only 1 (any of them is fine). So the final result should end up like

fid name   date
---------------------
1   John1  2020-10-08
2   Tom1   2019-10-08

Does anyone know how to do this in SQL Server? I use v14 (2017) if that matters.

Problem is if I group by fid and do Max(date), I get 1 record per fid, but then when I left join on it to get the other columns it gives me back 2 records since the most recent date shows up twice.

First it sounds like you want to remove rows from your table ("remove duplicates"), then it sounds like you merely want to select rows without duplicates ("to get the other columns"). Which do you want? `DELETE` or `SELECT`? — Thorsten Kettner, Oct 22 '21 at 06:53
Not delete original data, just get a query that doesn't include the ones I want gone. So I want to select. — omega, Oct 22 '21 at 13:51

score 6 · Accepted Answer · answered Oct 22 '21 at 06:22

6

We can use a deletable CTE along with ROW_NUMBER here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY fid ORDER BY date DESC, name) rn
    FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

The above logic will assign rn = 1 (i.e. spare) the record with the most recent date, per group of fid records. Should two records with the same fid also have the same latest date, then it spares the earlier name.

answered Oct 22 '21 at 06:22

Tim Biegeleisen

502,043
27
286
360

Note that I want to just select, not delete any actual data. – omega Oct 22 '21 at 13:52
@omega: Then replace `DELETE` with `SELECT *` (or rather the columns you want to select). – Thorsten Kettner Oct 22 '21 at 14:04
@omega Then use the same CTE but do `SELECT * FROM cte WHERE rn = 1` – Tim Biegeleisen Oct 22 '21 at 15:09

How to remove duplicates based on a certain column in SQL Server?

1 Answers1