I have a very large table (several million records). SOME records have duplicates (based on FieldA) where the only difference is a value in FiedldB. I'd like to create a query that will remove all duplicate records based on FieldA, keeping the one with the lowest value in FieldB. Is this possible?
Asked
Active
Viewed 437 times
1
-
Yes, this is possible. – wildplasser Apr 04 '15 at 13:29
2 Answers
1
CREATE TABLE TABLE1
(
FieldA VARCHAR2(30),
FieldB VARCHAR2(30),
FieldC VARCHAR2(30)
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B1','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B4','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B3','DUMMYDATA-C1'
);
INSERT INTO TABLE1 VALUES
('DUMMYDATA-A1','DUMMYDATA-B2','DUMMYDATA-C1'
);
COMMIT;
SELECT FieldA,
FieldB,
FieldC,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1; --IDENTIFIES DUPLICATES BASED ON RANK VALUE
---PERFORM DELETE
DELETE
FROM TABLE1
WHERE ROWID IN
(SELECT ROWID
FROM
(SELECT ROWID,
RANK() OVER( PARTITION BY FieldA ORDER BY FieldB ASC) AS COLUMN_ALIAS
FROM TABLE1
)
WHERE COLUMN_ALIAS>1
);
COMMIT;
SELECT * FROM TABLE1; -- CONTAINS A SINGLE RECORD
RANK function can identify the duplicate records and facilitate deleting only the duplicated record , retaining the original row. This has been already discussed here : Deleting duplicates rows from oracle . Hope this helps
However as DELETE itself is slower, proper constraints can be implemented in this scenario (contains millions of records) at the INSERT to avoid duplicate entry.

Community
- 1
- 1

Krishnendu
- 125
- 3
- 13
0
Seems pretty straightforward to extract these values:
select distinct a,
min(b) b
from t
group by a;
Fiddle for example: http://sqlfiddle.com/#!9/bc4c9/3
You should be able to adapt a removal method from this.

Hunter McMillen
- 59,865
- 24
- 119
- 170
-
I ran this using limit 100 and it ran fine. When I ran w/o it maxes out tmp file very quickly. Is there a way to run a query like this where it flushes tmp or am I going to manually run in batches e.g Where ID>0 and ID <100? That will sort of suck as there are 1/2 million records or so. – user3649739 Apr 05 '15 at 13:34
-
Why are you writing to a temp file? I thought you wanted to delete records. – Hunter McMillen Apr 05 '15 at 13:38