3

I have two rows in my table which are exact duplicates with the exception of a date field. I want to find these records and delete the older record by hopefully comparing the dates.

For example I have the following data

    ctrc_num | Ctrc_name   | some_date
   ---------------------------------------
        12345 | John R      | 2011-01-12
        12345 | John R      | 2012-01-12
        56789 | Sam S       | 2011-01-12
        56789 | Sam S       | 2012-01-12

Now the idea is to find duplicates with a different 'some_date' field and delete the older records. The final output should look something like this.

      ctrc_num | Ctrc_name   | some_date
   ---------------------------------------
        12345 | John R      | 2012-01-12
        56789 | Sam S       | 2012-01-12

Also note that my table does not have a primary key, it was originally created this way, not sure why, and it has to fit inside a stored procedure.

slimsim
  • 73
  • 1
  • 8

2 Answers2

1

If you look at this:

SELECT * FROM <tablename> WHERE some_date IN
(
SELECT MAX(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1 
AND COUNT(ctrc_name) > 1
)

You can see it selects the two most recent dates for the duplicate rows. If I switch the select in the brackets to 'min date' and use it to delete then you are removing the two older dates for the duplicate rows.

DELETE FROM <tablename> WHERE some_date IN
(
SELECT MIN(some_date) FROM <tablename> GROUP BY ctrc_num,ctrc_name
HAVING COUNT(ctrc_num) > 1 
AND COUNT(ctrc_name) > 1
)
russ
  • 579
  • 3
  • 7
  • Does this account for the lack of a primary key in the table? – slimsim Sep 24 '15 at 20:15
  • Well of course Aasim, there is no substitute for a primary key as this would give uniqueness in each row and will make any dedupe more effective, however, as the question stands we're not given to know whether or not there is any Primary Key, we're only shown 3 columns. – russ Sep 24 '15 at 21:15
0

This is for SQL Server

CREATE TABLE StackOverFlow
    ([ctrc_num] int, [Ctrc_name] varchar(6), [some_date] datetime)
;

INSERT INTO StackOverFlow
    ([ctrc_num], [Ctrc_name], [some_date])
SELECT 12345, 'John R', '2011-01-12 00:00:00' UNION ALL
SELECT 12345, 'John R', '2012-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2011-01-12 00:00:00' UNION ALL
SELECT 56789, 'Sam S', '2012-01-12 00:00:00'

;WITH RankedByDate AS
(
SELECT ctrc_num
,Ctrc_name
,some_date
,ROW_NUMBER() OVER(PARTITION BY Ctrc_num, Ctrc_name ORDER BY some_date DESC) AS rNum
FROM StackOverFlow
)
DELETE
FROM RankedByDate
WHERE rNum > 1

SELECT
[ctrc_num]
, [Ctrc_name]
, [some_date]
FROM StackOverFlow

And here is the sql fiddle to test it http://sqlfiddle.com/#!6/32718/6

What I tried to do here is

  1. rank the records by descending order of date
  2. delete those that are older (keep the latest)
whereisSQL
  • 638
  • 2
  • 13
  • 23
  • Apologies for not mentioning this earlier, but I needed the solution only for the Oracle database. I get a missing select keyword error . The logic is perfect though. – slimsim Sep 24 '15 at 19:46
  • Here's a similar question for Oracle - http://stackoverflow.com/questions/529098/removing-duplicate-rows-from-table-in-oracle – whereisSQL Sep 24 '15 at 20:19