0

I have a table lets say Tab1

id dt price
1 01/01/2021 50
2 01/22/2021 70.8
1 01/01/2021 49.99
2 01/21/2021 69
1 01/14/2021 54

Now, I must delete all the rows that have duplicate date and id column, irrespective of the value column (any one can stay, because values will be very near) So, the final table after the operation would be

id dt price
1 01/01/2021 50
2 01/22/2021 70.8
2 01/21/2021 69
1 01/14/2021 54

(No order necessary)

What is the fastest way in terms of time taken to run the query, to solve this?

My Question and this question is different as not all the columns of the duplicate rows of mine are same How to delete duplicate rows in SQL Server?

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • One way I have come up is that I thought about keeping the track of id and date data of duplicate rows in a temp table, then join the table on itself and delete the duplicate rows, and then, reinsert the data. – Debjit Ghosh Jul 21 '22 at 11:01
  • 1
    You misunderstand the technique referenced in your link. It does not matter which set of columns is used to identify "duplicate". It can be every column or a subset of columns. You simply manipulate the ROW_NUMBER logic to partition by that set of columns and include the appropriate ordering. And since you are trying to remove duplicates, now is also a good time to **prevent** duplicates in the future once the table is cleansed. – SMor Jul 21 '22 at 12:03
  • I too, fail to see how the duplicate is still not a Duplicate. You want to delete (some of the) rows where the `id` and`date` are the same; the duplicate demonstrates how to do that with a CTE and `ROW_NUMBER`. – Thom A Jul 21 '22 at 12:10
  • `WITH CTE AS (SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY (SELECT NULL)) FROM YourTable) DELETE FROM CTE WHERE RN > 1` – Charlieface Jul 21 '22 at 13:16

0 Answers0