Why is max(rowid) or min(rowid) is there while Deleting duplicate records?

Question

Can we delete duplicate records without using Pseudo coloumnrowid... what is meaning of max(rowid)/min(rowid) in while deleting duplicte records?

mysql or oracle? which one are you using? Yes there are several other methods answered here : https://stackoverflow.com/questions/18390574/how-to-delete-duplicate-rows-in-sql-server and https://stackoverflow.com/questions/529098/removing-duplicate-rows-from-table-in-oracle . but `ROWID` method is the most efficient one. — Kaushik Nayak, Apr 27 '18 at 05:10
If the table has a unique combination of columns then yes, you can use those to identify rows to delete. If the entire row is duplicated then you need a way to delete just one row in the set of duplicates, which is tricky in SQL (it becomes easier in PL/SQL). In this case `rowid` is useful. — William Robertson, Apr 27 '18 at 11:39

score 1 · Answer 1 · answered Apr 27 '18 at 08:01

ROWID is the internal row identifier used by Oracle to locate the physical record. So even though you may have repeated values for your "ID", each record ROWID will still be unique.

create table prices(
   id       varchar2(15) not null
  ,price    number       not null
  ,upd_date date         not null
--  ,primary key(id)
);

ROWID                ID PRICE UPD_DATE
------------------   -- ----- ----------
AChTgbADaAAFgxYAAA   A  7     2018-04-10

AChTgbADaAAFgxYAAB   B  8     2018-04-09
AChTgbADaAAFgxYAAC   B  8     2018-04-09
AChTgbADaAAFgxYAAD   B  8     2018-04-09

AChTgbADaAAFgxYAAE   C  9     2018-04-06
AChTgbADaAAFgxYAAF   C  8     2018-04-05
AChTgbADaAAFgxYAAG   C  7     2018-04-04

The MAX(rowid) in a group is often the most recently inserted record, but this assumption is wrong too often to rely on in production code. It can only be relied on to delete a perfect duplicate. A perfect duplicate is one where select distinct * results in one record. For all other uses, you need a discriminator. A discriminator column can be used to tell apart two records for example with an update date that indicates the time of modification.

Should you de-duplicate my example table with the typical ROWID approach, you would incorrectly delete the most recent price 9 (as evidenced by upd_date).

delete
  from prices
 where rowid not in(
        select max(rowid)
          from prices
      group by id);

A better approach would be to make use of the discriminator first, and then as last resort use the ROWID.

delete 
  from prices
 where rowid in(
        select rid
          from (select rowid as rid
                      ,row_number() over(            -- Assign a sequence number
                          partition by id            -- Group rows by ID
                              order by upd_date desc -- Sort them by upd_date first  
                                      ,rowid desc    -- Then by ROWID
                      ) as rn
                   from prices
               )
     -- The most recent record will be rn = 1.
     -- The second most recent record will be rn = 2, etcetera
        where rn > 1 -- Select only the duplicates ("after" the most recent one record
       );

score 0 · Answer 2 · answered Apr 27 '18 at 12:09

If the entire row is duplicated and you want to delete all but one copy, there is no easy way in SQL to pick the rows you want to delete without using system-generated row address.

Using Ronnis' PRICES table as an example, we see that there are three rows for B that are exact duplicates:

ID PRICE UPD_DATE
-- ----- -----------
A      7 10/04/2018
B      8 09/04/2018
B      8 09/04/2018
B      8 09/04/2018
C      7 04/04/2018
C      8 05/04/2018
C      9 06/04/2018

Although we might use something like

delete prices where id = 'B' and rownum <= 2;

this is not a good solution, as we have to know the IDs and counts, and apply to one ID at a time.

We can delete them without explicitly specifying the rowid using PL/SQL:

declare
    cursor c_prices is
        select id, price
              , row_number() over (partition by id order by upd_date desc) as seq
        from   prices
        for update;
begin
    for r in c_prices
    loop
        if r.seq > 1 then
            delete prices where current of c_prices;
        end if;
    end loop;
end;

although of course internally the where current of syntax is using the rowid.

Using rowid explicitly makes this much simpler:

delete prices where rowid in
( select lag(rowid) over (partition by id order by upd_date) from prices );

This finds all the "previous" rowids in order of date, and deletes the corresponding rows. The last row in each set won't appear in that list and so will not be deleted.

Why is max(rowid) or min(rowid) is there while Deleting duplicate records?

2 Answers2