0

I searched on the internet but could not find any explanation for my question below. The following code is not working when I try to remove duplicate rows from a test table. The test table has no unique or primary keys on it, and is defined as test(id number). Can you explain me why it is not working?

delete from test a 
where a.rowid in (select b.rowid 
                  from test b 
                  where b.id = a.id 
                  and b.rowid != a.rowid);
Charlesliam
  • 1,293
  • 3
  • 20
  • 36
  • this might help you clear out about rowid. http://stackoverflow.com/questions/2701782/rowid-oracle-any-use-for-it – Charlesliam Jul 08 '14 at 12:46
  • possible duplicate of [How can I remove duplicate rows?](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows) – Daniel Gadawski Jul 08 '14 at 12:48
  • 1
    What does "not working" mean? – Gordon Linoff Jul 08 '14 at 12:58
  • Charlesliam I think I know what a rowid is, and Daniel no this is not a duplicate question, I am just searching for an explanation. And yet, I could not find a logical explanation why the above code is not working. If you provide me with a nice logical explanation I will be grateful. Thanks in advance. – user3816289 Jul 08 '14 at 13:03
  • 'Not working' means it does not remove the duplicate rows. – user3816289 Jul 08 '14 at 13:04

2 Answers2

2

The normal way this query would work is more like:

delete from test a 
where a.rowid > (select min(b.rowid)
                 from test b 
                 where b.id = a.id
                );

Your query is removing no rows because of the correlation. You are asking for a.rowid being in a list where it is specifically excluded by b.rowid != a.rowid. Hence, no rows can match this condition -- of being in a list they are specifically excluded from.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • No, inner subquery just gives the rowid's that are different from the ones in the outer delete. And when you run the query in question, it just deletes 0 rows. – user3816289 Jul 08 '14 at 13:11
  • As far as I know, the inner subquery should return the result for each row of the table in the outer delete statement, and only after that the delete statement should execute. For instance, for the first row of the table in the outer delete, the delete statement should send the rowid of the first row to the inner subquery and then wait for the result that should return. And then the delete statement executes, in which case the subquery should return **the rowid's other than the one that is passed to subquery** (ie. `b.rowid != a.rowid`). Am I wrong? What is the error in this statement – user3816289 Jul 09 '14 at 10:03
  • When I run the subquery with replacing actual values of a.rowid and a.id from the queried table (ie. `select b.rid from test2 b where b.id = 1 and b.rid != 'XXX'`), the subquery returns exactly as expected (ie, it gives **the rowid's other than the one that is passed to subquery**). Yet using it in the delete statement as a subquery does not work. What is the reason for that? – user3816289 Jul 09 '14 at 10:14
  • @Gordon...Ok now I see why this sql is not working thank you Gordon.. I thought that the left hand side of the "in" is just for searching the rowid's in the list that is returned by the select statement. In which case the "in" statement cannot be defined as a logical expression. – user3816289 Jul 10 '14 at 06:41
1

This should help:

delete from test where rowid in
(
    select rowid from
      (select rowid,rank() over (partition by id order by rowid) rank 
       from test)temp_test 
    where temp_test.rank >1
);
dips
  • 76
  • 4