2

We have some useless historical data in a database which sums upto 190 million (19 crores) rows in database contributing to 33-GB . Now I got a task to delete these much rows in one go and if in any case something breaks, I should be able to rollback the transaction.

I will select them based on some flag like deleted ='1' which from my estimation counts to 190 million out of 200 million. So first I have to do a select operation and then delete those id's.

As mentioned in this article, it is taking 4 hours to delete 1.5 million records, which count is far less than my case and I am wondering if I proceed with single deleted approach how much time it would take to delete 190 million records.

Should I use Spring-Batch for selecting id's of rows and then delete them batch by batch or issue a single statement by passing id's in IN clause.

What would be a better approach please suggest.

10101010
  • 1,781
  • 1
  • 19
  • 25
Anand Kadhi
  • 1,790
  • 4
  • 27
  • 40
  • 1
    If you have an option, dropping entire table might not be a bad thing (i.e, if your tables contain only historical data) – TheLostMind Jul 15 '16 at 06:59
  • 2
    if its a one time activity, why do it through java, do it directly in oracle using a pl-sql. Am a bit rusty with Oracle at the moment, but you can create a snaphost and then run your delete SQL and if there are any issues just rollback to the snapshot. – Yogesh_D Jul 15 '16 at 07:04
  • 1
    Adding to @TheLostMind: And if only few rows will persist, it might be an option to have a transaction 1) insert these remaining into other table, 2) drop table, 3) rename other table 4)commit. – Dilettant Jul 15 '16 at 07:06
  • 1
    @Dilettant - Yes, that's perhaps the right approach (assuming his data is stored according to our assumptions) – TheLostMind Jul 15 '16 at 07:08
  • 1
    @TheLostMind Actually i also have some records which shouldn't be deleted ( edited question) i mean out of 200 million i have to select the 'ids ' and remove the 190 million – Anand Kadhi Jul 15 '16 at 07:35
  • 1
    Anandj.Kadhi - Then use what @Dilettant suggested. Fits your needs IMHO – TheLostMind Jul 15 '16 at 07:39
  • @Anandj.Kadhi Oracle has a limitation of 1000 items in the in clause. See http://stackoverflow.com/questions/400255/how-to-put-more-than-1000-values-into-an-oracle-in-clause, so IN clause can be a problem depending on your batch size. – Yogesh_D Jul 15 '16 at 08:26
  • It seems you have a is_delete column that can tell you whether to delete a row, why not write a module in your app that runs during non critical hours, and have a time limit on its run. During this run it can pick up batches of data to delete and clean that up include referential data as needed. – Yogesh_D Jul 15 '16 at 08:28
  • 1
    One more comment, make sure your DBA's are aware of this cleanup as deletion will have an impact on Index in terms of fragmentation and might need a rebuild. – Yogesh_D Jul 15 '16 at 08:32
  • 2
    Careful with @Dilettant approach, in Oracle DDL like Drop Table or Alter Table have an implicit commit. – vercelli Jul 15 '16 at 10:05
  • So as @vercelli notes, also ensuring a snapshot based rollback makes sense, might be the only viable option (for rare maintenance tasks like that). – Dilettant Jul 15 '16 at 10:12

4 Answers4

1

Why not moving the required data from historical table to a new table and dropping the old table entirely? You might rename the new table to old table name later on.

XING
  • 9,608
  • 4
  • 22
  • 38
  • 1
    Assuming there is no referential integrity in play this is an epic idea. But so far the question is about doing it with Spring Data so your answer doesn't really apply :/ – Gimby Jul 15 '16 at 07:43
  • Thats true..I replied coz it was under oracle tag as well. Also why to complicate the things when you can easily achieve it. – XING Jul 15 '16 at 07:48
1

you can do copying required data from historical table to a new table and drop the old table entirely and rename the new table to old table name later -- as said by Raj in above post. this is best way to do.

and also you can use nologging and parallel options to speed up for example :

create table History_new parallel 4 nologging as select /*+parallel(source 4) */ * from History where col1 = 1 and ... ;

Ramki
  • 453
  • 2
  • 7
0

If doing it in Java is not mandatory, I'd create a PL/SQL procedure, open a cursor and use DELETE ... WHERE CURRENT OF. Maybe it's not super fast, but it's secure because you will have no rollback segment problems. Using a normal DELETE even without transaction is an atomic operation that must be rolled back if something fails.

Lluis Martinez
  • 1,963
  • 8
  • 28
  • 42
0

Maybe what you said is usual and normal performance for Java, but at my notebook deleting of 1M records requires about a minute - without Java, of course.

If you wish to do it good, I'd say you should use partitions. First of all, convert the plain table(s) into the partitioned one(s) with all data into one (current) partition. Then, prepare "historical" partitions and move unnecessary data into them. And after that you'll be ready to do anything. You'll can to move it offline (but restore when needed), you'll be able to exclude this data in seconds using EXCHANGE PARTITION and so on.

Sanders the Softwarer
  • 2,478
  • 1
  • 13
  • 28