Using rowid to delete duplicate rows from an oracle database?

Question

I'm using the following query to delete duplicate rows from a table. There are only 2 columns in the table (col1 and col2).

delete from tab1 where rowid not in
(select min(rowid) from tab1 t group by col1, col2);

Is there any problem, like some side effects, if I use this query to delete duplicate rows?

Using ROWID itself has nothing wrong. That's what Oracle's SQL Developer uses for almost all internal queries. — Álvaro González, Aug 30 '11 at 08:31
I think your present approach based on rowid is right on target. My understanding is that rowid's are one of the _fastest_ ways of accessing data because it is the physical address of that row in the database. Tom Kyte endorses your initial approach [Ask Tom: Deleting Duplicate Records](http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1224636375004) — WoMo, Aug 30 '11 at 15:21

t-clausen.dk · Answer 1 · 2011-08-30T08:51:40.970

2

As far as i can tell, your query will perform slow

This should give you a better performance

delete t1
from tab1 t1
join tab1 t2
on t1.rowid > t2.rowid 
and t1.col1 = t2.col1 
and t1.col2 = t2.col2

Or this (it works on mssql, I believe oracle has the same syntax)

;WITH [CTE DUPLICATE] AS 
(
SELECT 
  ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY id) RN
FROM tab1
)
DELETE FROM [CTE DUPLICATE] WHERE RN > 1

edited Aug 30 '11 at 08:51

answered Aug 30 '11 at 08:29

t-clausen.dk

`This should give you a better performance`, provided you have indexes on the fields `col1` and `col2` – Johan Aug 30 '11 at 09:31
@Johan even without the index i believe it will give a much better performance – t-clausen.dk Aug 30 '11 at 09:44
I see what you mean, with all the `group by` going on in the original query. – Johan Aug 30 '11 at 09:46

score 0 · Answer 2 · edited May 23 '17 at 11:58

0

This query is ok. You can phrase it without rownum as described in this question:

edited May 23 '17 at 11:58

Community

answered Aug 30 '11 at 07:11

WW.

2 Answers2