finding duplicate records in a table in oracle

Question

Consider the scenario of loading of a table from a flat file. the table has no constraints or indexes defined.Somehow in between loading was interrupted and after some time the table was again loaded from the same file. So this time the records already inserted during first loading were duplicated. how to find the duplicate rows now ? assume there are 150 columns in the table so group by each and every column is tedious

Is there a primary key, or even fields which indicate a unique constraint, if only logical? Why not just delete all the records and start over? use a natural join a cte and row number? — xQbert, Dec 26 '14 at 20:58
Maybe calculate the hash value of your record at insert, you can use the hash value to quickly find duplicates. you can even use this hash as a primary if you like so you don't insert duplicates — Tien Dinh, Dec 27 '14 at 20:11

score 0 · Answer 1 · edited May 23 '17 at 12:28

A record is truly duplicate only if all the column values match. It becomes different or unique even if 1 column has a different value. If your table has no primary constraints, you must compare all columns. An alternative way could be that you could do your 2nd load on a new temp table and populate your old table with records from this temp table where the records do not exist in the old table. In any case you have to compare all columns between the 2 tables to identify truly unique records.

You could also consider adding a primary key to your table and then running your delete query. Check the accepted answer on this link

score 0 · Answer 2 · answered Dec 26 '14 at 21:05

0

You can use ROWID for deleting duplicate rows;

Select * FROM table_name A
 WHERE 
a.rowid > ANY (
 SELECT 
    B.rowid
 FROM 
    table_name B
 WHERE 
    A.col1 = B.col1
 AND 
    A.col2 = B.col2
    );

here is a useful link: [http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm

answered Dec 26 '14 at 21:05

void

7,760
3
25
43

Except what does he join on? no unique key defined. – xQbert Dec 26 '14 at 21:10

xQbert · Answer 3 · 2014-12-26T21:23:25.417

Tested... Appears to work...

1st we get a list of the table columns in a comma separated list

SELECT wm_concat(column_Name) 
FROM all_tab_cols 
WHERE table_name = 'TABLENAME'Select and Column_ID is not null;

copy the results into query below where ResultList is defined. adjust 'Tablename' to your table.

WITH CTE AS (SELECT TN.*, RowNum RN from 'TableName' TN order by ResultList), 
SELECT * FROM CTE A
 INNER JOIN CTE B using (ResultList)
 WHERE A.RN <> B.RN

The above uses natrual joins to join all the tables columns to the same table columns and since duplicate rows will have different row numbers, the result set will list both offending records.

score 0 · Answer 4 · answered Dec 26 '14 at 21:43

0

I got this snippet somewhere along the line for deleting dups:

DELETE FROM TABLE_NAME
    WHERE ROWID IN
      (SELECT ROWID FROM TABLE_NAME
       MINUS
       SELECT MIN(ROWID) FROM TABLE_NAME
       GROUP BY <column list> );

Note the column_list lists the columns that are used to determine uniqueness.

answered Dec 26 '14 at 21:43

Gary_W

9,933
1
22
40

yes, i am aware of this method, as I mentioned in the question if suppose the table contains 150 columns it is tedious to add each of them in the group by list. also this is a table with no constraints or indexes defined on it. in such a case is there any means of acheiving the same. – Sundararaman P Dec 26 '14 at 21:49
If you loaded with SQL*Loader the log file will tell you how many rows were loaded. When reloading, use the SKIP= option to skip what was already loaded. Another method to ease the hassle if you ever have to reload, make sure your table has a load_date column or a load sequence ID you set for each load. Then its a snap to delete where the date or ID matches the last load, then reload. – Gary_W Dec 26 '14 at 22:00

score 0 · Answer 5 · edited Dec 29 '14 at 15:34

0

Select * FROM table_name A
 WHERE 
a.rowid >  (
 SELECT 
    min (B.rowid)
 FROM 
    table_name B
 WHERE 
    A.row_id = B.row_id

    );

edited Dec 29 '14 at 15:34

Eric Hauenstein

2,557
6
31
41

answered Dec 29 '14 at 15:04

arpit

11
1
4

4

Perhaps you could explain what is going on in your code, to help others. – Eric Hauenstein Dec 29 '14 at 15:15

score 0 · Answer 6 · answered Jan 03 '15 at 07:25

Suppose you are having a test table(table in which you moved the record using flat file) dummd which is having multiple columns (like 150 and you are not sure which column is unique or primary )and duplicate rows so to find all the unique records you can use union and then create a view or new table like i did as test1 :-

 create table test1 
 as
 select * from dummd
 union
 select * from dummd

finding duplicate records in a table in oracle

6 Answers6