-1

I have a table which has around 300,000 rows. 225 Rows are being added to this table daily since March 16,2015 till July 09,2015

My problem is that, from last 1 week or so, some duplicate rows are being entered in the table (i.e more than 225 per day)

Now I want to select (and ultimately delete!) all the duplicate rows from the table that have more than 1 siteID+ reportID combination existing against one Date column .

Example is attached in the screenshot:

enter image description here

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
Saruchi
  • 346
  • 2
  • 13
  • 2
    Once you've mopped the floor, don't forget to fix the leak. If duplicates shouldn't exist, don't forget to add a `UNIQUE` constraint to the table once you've removed the current duplicates. – Damien_The_Unbeliever Jul 13 '15 at 10:54
  • Thanks,will keep in mind..any suggestions on the query ? :) – Saruchi Jul 13 '15 at 10:56
  • possible duplicate of [How do I find duplicates across multiple columns?](http://stackoverflow.com/questions/8149210/how-do-i-find-duplicates-across-multiple-columns) – KarmaEDV Jul 13 '15 at 10:56

2 Answers2

0

When you want to filter duplicated rows I suggest you this type of query:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col3) As seq
    FROM yourTable) dt
WHERE (seq > 1)

Like this:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY siteID, reportID, [Date] ORDER BY ID) As seq
    FROM yourTable) dt
WHERE (seq > 1)
shA.t
  • 16,580
  • 5
  • 54
  • 111
0

When Row_Number() is used with Partition By clause, it can provide the SQL developer to select duplicate rows in a table

Please check the SQL tutorial on how to delete duplicate rows in SQL table Below query is what is copied from that article and applied to your requirement:

;WITH DUPLICATES AS
(
SELECT *,
  RN = ROW_NUMBER() OVER (PARTITION BY siteID, ReportID ORDER BY Date)
FROM myTable
)
DELETE FROM DUPLICATES WHERE RN > 1

I hope it helps,

Eralper
  • 6,461
  • 2
  • 21
  • 27
  • It seems that `PARTITION` should be by `siteID, ReportID, Date` and `ORDER` by some other column, like `ID`. – Vladimir Baranov Jul 13 '15 at 12:22
  • I do not agree :) As I understand from the requirement, it is enough for records to be duplicates if they have same SiteID and ReportID. On the other hand, if it is as you said, then Date should be included in the Partition By clause, and Order By field is defining the original and the duplicates. – Eralper Jul 13 '15 at 12:27