Can we delete duplicate records without using Pseudo coloumnrowid
...
what is meaning of max(rowid)/min(rowid) in while deleting duplicte records?

- 1
- 1
- 4
-
1mysql or oracle? which one are you using? Yes there are several other methods answered here : https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server and https://stackoverflow.com/questions/529098/removing-duplicate-rows-from-table-in-oracle . but `ROWID` method is the most efficient one. – Kaushik Nayak Apr 27 '18 at 05:10
-
If the table has a unique combination of columns then yes, you can use those to identify rows to delete. If the entire row is duplicated then you need a way to delete just one row in the set of duplicates, which is tricky in SQL (it becomes easier in PL/SQL). In this case `rowid` is useful. – William Robertson Apr 27 '18 at 11:39
2 Answers
ROWID
is the internal row identifier used by Oracle to locate the physical record. So even though you may have repeated values for your "ID", each record ROWID will still be unique.
create table prices(
id varchar2(15) not null
,price number not null
,upd_date date not null
-- ,primary key(id)
);
ROWID ID PRICE UPD_DATE
------------------ -- ----- ----------
AChTgbADaAAFgxYAAA A 7 2018-04-10
AChTgbADaAAFgxYAAB B 8 2018-04-09
AChTgbADaAAFgxYAAC B 8 2018-04-09
AChTgbADaAAFgxYAAD B 8 2018-04-09
AChTgbADaAAFgxYAAE C 9 2018-04-06
AChTgbADaAAFgxYAAF C 8 2018-04-05
AChTgbADaAAFgxYAAG C 7 2018-04-04
The MAX(rowid) in a group is often the most recently inserted record, but this assumption is wrong too often to rely on in production code. It can only be relied on to delete a perfect duplicate. A perfect duplicate is one where select distinct *
results in one record. For all other uses, you need a discriminator. A discriminator column can be used to tell apart two records for example with an update date that indicates the time of modification.
Should you de-duplicate my example table with the typical ROWID approach, you would incorrectly delete the most recent price 9 (as evidenced by upd_date).
delete
from prices
where rowid not in(
select max(rowid)
from prices
group by id);
A better approach would be to make use of the discriminator first, and then as last resort use the ROWID.
delete
from prices
where rowid in(
select rid
from (select rowid as rid
,row_number() over( -- Assign a sequence number
partition by id -- Group rows by ID
order by upd_date desc -- Sort them by upd_date first
,rowid desc -- Then by ROWID
) as rn
from prices
)
-- The most recent record will be rn = 1.
-- The second most recent record will be rn = 2, etcetera
where rn > 1 -- Select only the duplicates ("after" the most recent one record
);

- 12,593
- 2
- 32
- 52
If the entire row is duplicated and you want to delete all but one copy, there is no easy way in SQL to pick the rows you want to delete without using system-generated row address.
Using Ronnis' PRICES
table as an example, we see that there are three rows for B
that are exact duplicates:
ID PRICE UPD_DATE
-- ----- -----------
A 7 10/04/2018
B 8 09/04/2018
B 8 09/04/2018
B 8 09/04/2018
C 7 04/04/2018
C 8 05/04/2018
C 9 06/04/2018
Although we might use something like
delete prices where id = 'B' and rownum <= 2;
this is not a good solution, as we have to know the IDs and counts, and apply to one ID at a time.
We can delete them without explicitly specifying the rowid using PL/SQL:
declare
cursor c_prices is
select id, price
, row_number() over (partition by id order by upd_date desc) as seq
from prices
for update;
begin
for r in c_prices
loop
if r.seq > 1 then
delete prices where current of c_prices;
end if;
end loop;
end;
although of course internally the where current of
syntax is using the rowid.
Using rowid explicitly makes this much simpler:
delete prices where rowid in
( select lag(rowid) over (partition by id order by upd_date) from prices );
This finds all the "previous" rowids in order of date, and deletes the corresponding rows. The last row in each set won't appear in that list and so will not be deleted.

- 15,273
- 4
- 38
- 44