6

We have a table with about 1.5 million records. This table has a lot of FK relations to from different tables.

The problem is that 1 million record just duplicates which have to be deleted. We try to delete 1000 records at the time, but it's a very slow process.

What I have in mind is to copy temporarily records that have to stay to a new table. Truncate existing one and copy records that have to stay back. With restoring primary key and all relations to the other tables. So from client side you cannot see any difference.

Not sure if it's an efficient way or not.

If it's I would love to see basic implementation of it so I can follow and apply to my case. If not I would like to see efficient way of doing it.

Thank you

German
  • 740
  • 2
  • 10
  • 24
  • 1
    Is this a once off cleanup, or does your system make a habit of duplicating data? – Trent Jun 29 '13 at 16:39
  • Do you have an index on the column that you filter out rows that you want to delete ? – Ilkka Jun 29 '13 at 16:41
  • Yes, we do have an index – German Jun 29 '13 at 16:41
  • So basically it's fast to make the query but slow to delete ? Have you tried to just select the rows ? – Ilkka Jun 29 '13 at 16:42
  • 2
    disable the indexes, then copy the unique data 500K into a table B, then truncate and drop the original table, then rename the B to the original name, and go for indexes –  Jun 29 '13 at 16:48
  • what about the PK and FK relation it has? – German Jun 29 '13 at 16:49
  • use set identity_insert on http://msdn.microsoft.com/en-us/library/ms188059.aspx – Ian Kenney Jun 29 '13 at 16:52
  • I'm wondering why it's slow to delete even only 1000 rows. How do you identify duplicate rows, what is your implementation ? – Ilkka Jun 29 '13 at 16:53
  • you misunderstood me, it's not slow to delete 1000 rows. In general slow process when you have to delete 1 million rows – German Jun 29 '13 at 16:54
  • 1
    @German, just like indexes, add relationship too, or if the relations are to much and complex, disable the indexes, move the duplicate data into table B then recreate the indexes (no enable it), I think, indexes are the reason you find the process as snail move –  Jun 29 '13 at 16:58
  • Is the table actively in use while you're deleting? – John Tseng Jun 29 '13 at 17:12
  • 1
    You cannot truncate a table with FKs on it. – RBarryYoung Jun 29 '13 at 17:44

2 Answers2

2

Our company has a bunch of temporary data stored in databases. When we need to delete a bunch of them, we break it up into a few hundred rows and delete them chunks at a time. We have an application whose sole purpose in life is to run a few queries like this over and over again:

with topFew as (select top 100 * from table) delete topFew

I suggest you whip up something simple like this, and just let it run for a few hours. Go work on something else while it's processing.

John Tseng
  • 6,262
  • 2
  • 27
  • 35
  • yes, good point, it could be done by a 3rd. party application, send each delete operation as a low_priority transaction to the database, I tried to vote up, but system told enough for today :D –  Jun 29 '13 at 17:34
1

Performance of the delete can be improved by self joining the table using rowid. It can be even optimized by using a bulk collect and FORALL

     DECLARE

     limit_in integer;
     CURSOR C1 is
     Select min(b.rowid) 
       from table_name a, table_name b
      where a.primary_key = b.primary_key;

      TYPE C1_rec IS TABLE OF C1%ROWTYPE
      INDEX BY PLS_INTEGER;

     C1_record C1_rec

     BEGIN
     limit_in:=10000  --- Can be changed based on performance
     OPEN C1;
       LOOP
        FETCH C1 BULK COLLECT INTO C1_record LIMIT limit_in;
        FORALL indx in 1..c1_record.count
         DELETE FROM table_name where row_id = C1_record(i);
         commit;
       END LOOP;
     END;

The table that is to be deleted has child tables, So there will be a constraint Violation.

So before executing the above piece of code, It is a better option to alter the foreign key constraint TO HAVE DELETE CASCADE. We cannot modify a constraint to add delete cascade. So the foreign key should be dropped and recreated to have delete cascade

    ALTER child_table
    ADD CONSTRAINT fk_name
    foreign_key (C1)
    references parent_table (C2) on delete cascade;

Delete cascade would clean up your child tables as well..

Valli
  • 1,440
  • 1
  • 8
  • 13