0

I need to delete 10 000 records from a table containing 9 million records. The IDs which are to be deleted will be fetched from a complex query and stored in a Java collection.

I have 3 approaches to implement this

1) Create a prepared statement and add 10000 statements to the batch and execute it.

statement will look like this

Delete from <table_name> where id=?;

2) Write a 'in' query rather than using '=' in a batch. Like

Under this, the 10 000 IDs can be created as comma separated values in Java code and added to the query. Or, 10000 IDs are inserted in to a temporary table and make a select from that table in the sub query.

Delete from <table_name> where id in (<CSV>);
                 or
Delete from <table_name> where id in (select id from <temp_table>);

There are no constraints and indexes in the table. And I cannot add one, because I'm working on a existing table.

First option is taking ages to complete. It was running for 15hrs and still not completed.

James Z
  • 12,209
  • 10
  • 24
  • 44
Shiva Mothkuri
  • 307
  • 3
  • 12
  • Are you saying that table doesn't even have a primary key on the `id` column? – Andreas Feb 05 '16 at 07:46
  • It may sound weird, but yes! no primary key on id :) – Shiva Mothkuri Feb 05 '16 at 07:47
  • 1
    A table with an `id` but no primary key. That's just horrendous database design. No lookup can ever perform on such a table, so what's the point? – Andreas Feb 05 '16 at 07:49
  • That's what making my work even harder. But those tables are from different datasource which is not maintained by us. I can't just say I don't have an answer, so taking help of you guys. – Shiva Mothkuri Feb 05 '16 at 07:55
  • If you can add new database objects (but not alter exisitng ones), you can consider adding a materialized view for your table. Unlike normal views, a materialized view can have an index even on columns that don't have an index in the underlying table. Then, you can delete records in this view directly and have an additional trigger for deleting records in the underlying table. Just an idea (haven't tried this) – dsp_user Feb 05 '16 at 09:43

4 Answers4

4

You first version has a limit of 1000 values and tends to not perform well. The second approach may perform better but you have to have a global temporary table and populating it is an extra step.

You can convert your Java collection to an Oracle collection. You can create your own table type for this, but there are built-in ones like ODCINUMBERLIST which you can use here. You can the treat that as a table collection expression.

The details may vary slightly depending on your Java collection type, but the outline is something like:

ArrayDescriptor aDesc = ArrayDescriptor.createDescriptor("SYS.ODCINUMBERLIST",
  conn);
oracle.sql.ARRAY oraIDs = new oracle.sql.ARRAY(aDesc, conn, yourJavaCollectinOfIDs);

cStmt = (OracleCallableStatement) conn.prepareCall(
  "Delete from <table_name> 
   where id in (select column_value from table(?))");
cStmt.setArray(1, oraIDs);
cStmt.execute();

Unless it is already a simple array, You will need to convert your Java collection to an array in the call; e.g. if you're using an ArrayList called yourArrayList, you would do:

oracle.sql.ARRAY oraIDs = new oracle.sql.ARRAY(aDesc, conn, yourArrayList.toArray());

You will still suffer from the lack of a primary key or index but it will give Oracle a better chance to optimise it than the CSV list (or multiple CSV lists OR'd together as you have more than 1000 IDs).

APC
  • 144,005
  • 19
  • 170
  • 281
Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Thanks Alex for the response. Tried to implement your approach with built-in types, but ended with 'java.sql.SQLException: Fail to convert to internal representation''. Used types like 'ODCINUMBERLIST' and 'ODCIVARCHAR2LIST'. Any help? My IDs are not numbers, they are Strings. They look like **1:123**. – Shiva Mothkuri Feb 09 '16 at 16:05
  • @ShivaMothkuri - in the createDescriptor call, are you using ODCIVARCHAR2LIST or SYS.ODCIVARCHAR2LIST? How is your Java collection defined? – Alex Poole Feb 09 '16 at 16:18
  • used SYS.ODCIVARCHAR2LIST. My collection is ArrayList. Is colon(:) in the string can affect something? – Shiva Mothkuri Feb 09 '16 at 16:31
  • @ShivaMothkuri - I think you're just missing a toArray() call to convert your ArrayList to an array. I've added that to the answer. – Alex Poole Feb 09 '16 at 18:33
  • Oh yes! you're right. I was missing toArray() call. It's working now. Thanks. By this approach, the deletes are done @ 10000 records/3seconds. One thing to confirm from you. Do I need to drop these Oracle arrays manually or they'll be removed automatically? – Shiva Mothkuri Feb 10 '16 at 15:55
  • If I change the batch size to 1000 from 10 000, theoretically, will there be any performance improvement? – Shiva Mothkuri Feb 10 '16 at 15:57
  • @ShivaMothkuri - I don't think batch size will come into it for a single delete command; I'd expect a single call with a 10,000 element array to perform slightly better at the DB end than ten calls of 1,000 each, but you're increasing memory usage and the network traffic might make it hard to discern. Only way to tell is to try it really. And you should call `oraIDs.free()` to clean up, yes. – Alex Poole Feb 10 '16 at 16:10
1

You should not use the the first option by executing 10000 statements from your java code.

Creating a temp table is a good idea. But most of the time you can not have a IN (...) clause with more than 1000 items. So your approach with CSV may not success.

You may go for

Delete from <table_name> where id in (select id from <temp_table>);

but this way is not optimized either. It would be better to change your delete statement into this:

Delete from <table_name> m where exists (select id from <temp_table> t where m.id = t.id);

But if you do such operations frequently it's highly recommended to add some constraints and indexes to you <table_name> and even to your <temp_table>. It will boost your operations execution time like a charm.

STaefi
  • 4,297
  • 1
  • 25
  • 43
  • For Oracle the [limit is always 1000](http://stackoverflow.com/a/19003103/266304), and isn't related to the size of the items. – Alex Poole Feb 05 '16 at 08:04
  • @AlexPoole: Thanks for the hint. That's right, I will edit my answer. – STaefi Feb 05 '16 at 08:06
  • I would be very surprised if the `IN` version was different than the `EXISTS` version performance-wise. – gpeche Feb 07 '16 at 21:56
0

The WHERE ... IN (...) is the way to go.

The IN clause can reference a temporary table that you've populated (your original idea), or it can contain any chosen (fixed) number of ? parameters. It will reduce the number of db roundtrips by a factor equal to the chosen number, but not necessarily to one. Iterate over your collection and process it in chunks.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
  • so, you mean to add 10,000 question marks to a prepared statement and using java loop preparedstatement.setString(1, value), preparedstatement.setString(2, value)? – Shiva Mothkuri Feb 05 '16 at 07:45
  • No, I meant any chosen number of question marks. Say, 50. Or 100. Or 200. Apparently, the other answers reveal that 10000 will be impossible because of imposed limits. – Erwin Smout Feb 05 '16 at 10:50
0

Try like this.

Delete from <table_name> where
    id in (1, 2, 3, ... ,1000)
    or id in (1001, 1002, ... , 2000)
    ....