15

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.

My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:

7384 separate DELETE statements took 203 seconds:

delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...

7384 separate SELECT statements took 57 seconds:

select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...

7384 separate DELETE from (SELECT) statements took 214 seconds:

delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...

1 SELECT statement that has 7384 OR clauses in the where took 127.4s:

select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...

1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:

delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)

While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.

So my questions are:

  • Is there a better way to delete?
  • Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
  • Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
GIS-Jonathan
  • 4,347
  • 11
  • 31
  • 45
  • 1
    try `delete from TABLE1 where ID in (1000001356443294,1000001356443296,...)` – CapelliC Apr 10 '12 at 16:17
  • You're deleting 30 million rows from these six tables. How many rows, ballpark, will be left in each table? How much concurrent write activity is required to be supported while this is going on? – Adam Musch Apr 10 '12 at 18:31
  • @AdamMusch - I'm only deleting a few tens of thousands of rows from the 30 million. Far less than 1% of the rows are going to be deleted. There won't be any writing going on whatsoever during this period (but probably lots of concurrent reading). – GIS-Jonathan Apr 11 '12 at 09:00
  • @chac - Just tried this and it doesn't work. I get the error: `ORA-01795: maximum number of expressions in a list is 1000` (there are over 7000 just in this test query) – GIS-Jonathan Apr 11 '12 at 09:06
  • To delete 1000 how much time is required? that should scale more or less linearly... – CapelliC Apr 11 '12 at 09:25
  • @Chac - 3 seconds. Wow. Assuming a linear scale that'd give about 22 seconds for the whole lot, 3 times faster than the next fastest! Of course, I'm not sure the tool I'm using will allow me to easily split it into 1000 item lists, but this is probably more workable than the other solutions presented. Thanks. – GIS-Jonathan Apr 11 '12 at 09:43
  • @chac - upon further testing this doesn't scale to larger tables. It takes 3s to run against the 9400 rows but 163s to run against the 200,000 row table. That's 54 times longer for a table that's only 20 times larger. – GIS-Jonathan Apr 11 '12 at 10:15

4 Answers4

15

In advance of my questions being answered, this is how I'd go about it:

Minimize the number of statements and the work they do issued in relative terms.

All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.

Consider Using CREATE TABLE AS SELECT for really large deletes

If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.

Don't run DELETE statements you know won't delete any rows

If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.

CREATE TABLE TABLE1_PURGE NOLOGGING
AS 
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;

DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DROP TABLE TABLE1_PURGE;

and repeat.

Manage Concurrency if you have to

Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.

declare
  l_sql varchar2(4000);
begin
  for i in (select table_name from all_tables 
             where table_name in ('TABLE_1', 'TABLE_2', ...)
             order by table_name);
  loop
    l_sql := 'delete from ' || i.table_name || 
             ' where id in (select id from purge_ids) ' || 
             '   and rownum <= 1000000';
    loop
      commit;
      execute immediate l_sql;
      exit when sql%rowcount <> 1000000;  -- if we delete less than 1,000,000
    end loop;                             -- no more rows need to be deleted!
  end loop;
  commit;
end;
Adam Musch
  • 13,286
  • 2
  • 28
  • 32
  • Thanks for the options. **(1)** I'm not deleting many rows (<1% as pointed out above) so duplicating the table would probably be silly. **(2)** Your second option looks good as an ID should only be in one table, however I'm not clear on how to use it. Where does PURGE_IDS come from? It looks like its a separate table from TABLE1_PURGE, but then why create a new table when they're already in a new table (PURGE_IDS)? **(3)** As noted in my comment, nothing will be being written in any while during this process. There will be some `SELECT` statements running. – GIS-Jonathan Apr 11 '12 at 10:56
  • I'm assuming there's some set of business rules where you capture the IDs that you'd use to delete rows from the set of tables. The "advantage" of creating TABLE1_PURGE would be that it would capture the IDs deleted from TABLE_1 before you delete them, so you can eliminate those IDs from future purges. – Adam Musch Apr 11 '12 at 13:04
  • I just tested this and it seems I didn't fully understand what it was doing (hence the comment-question). Your specified Method 2 is incredibly fast! On the 200,000 row table it took less than 3 seconds! Against a 3 million row table (same 7000 deletes) it was just 36s! That's a massive speed increase compared to all of the other methods tried, I didn't realise there was that much variability. Many thanks! – GIS-Jonathan Apr 11 '12 at 13:34
  • I am a little confused about the second option. There are 4 statements, a create, 2 deletes, and a drop. Do you perform the create and drop once for the entire operation and the the deletes n times for as many deletes that we want or perform all of them n times? – sisharp Jul 10 '13 at 15:02
  • You'd repeat the four statements for each table one is looking to delete from. So when purging from `TABLE2`, you'd create table `TABLE2_PURGE` and so forth. – Adam Musch Jul 17 '13 at 03:12
1

Store all the to be deleted ID's into a table. Then there are 3 ways. 1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.

2) Use Oracle Bulk Delete

3) Use correlated delete query.

Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.

0

First, disabling the index during the deletion would be helpful.

Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following

MERGE INTO table1 src
USING (SELECT id,col1
         FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
  UPDATE
     SET src.col1 = tgt.col1
  DELETE
   WHERE src.id = tgt.id
Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
mcha
  • 2,938
  • 4
  • 25
  • 34
  • Disabling the index during the deletion sounds great - unless that's being used to enforce uniqueness. – Adam Musch Apr 10 '12 at 18:28
  • 1
    Dropping the index would be counter-productive if the index is on the column specified in the `where` clause. – Allan Apr 10 '12 at 21:43
  • @AdamMusch The index isn't enforcing uniqueness (though it is set to be UNIQUE). The reason it was created was because my DBA said that without it Oracle would be performing a full table scan for each `WHERE` clause; as @Allan surmised, this index is the same column as used in the `where`. There are two other indexes, but both of those are required for applications. – GIS-Jonathan Apr 11 '12 at 09:03
  • beware to index rebuild, could be a real pain – CapelliC Apr 11 '12 at 09:23
  • @chac - Is a index rebuild actually necessary? My general readings imply that the index is auto-updated after data changes anyway. I found this [Oracle blog post](https://blogs.oracle.com/sysdba/entry/when_to_rebuild_index) but it doesn't make much sense to me (this should probably be a seperate question). – GIS-Jonathan Apr 11 '12 at 09:47
  • If the statistics are representative, then Oracle will automatically not use the index (when it makes sense, which it does when processing a lot of rows). – David Balažic Jun 10 '16 at 14:59
0

I have tried this code and It's working fine in my case.

DELETE FROM NG_USR_0_CLIENT_GRID_NEW WHERE rowid IN
( SELECT rowid FROM
  (
      SELECT wi_name, relationship, ROW_NUMBER() OVER (ORDER BY rowid DESC) RN
      FROM NG_USR_0_CLIENT_GRID_NEW
      WHERE wi_name = 'NB-0000001385-Process'
  )
  WHERE RN=2
);