0

I have been able to get the delete range from about 3 minutes to 3-7 seconds. However I'm not sure if I'm following the ideal route for deleting information. From what I've been reading the ideal way to delete is the following:

  1. Remove all foreign key restraints.
  2. Delete the Data.
  3. Re Add the foreign key restraints.

However this does get a little messy since we also want to delete the information inside of the foreign keys. So I have come to the following solution.

  1. Remove all foreign key restraints.
  2. Get the list of all the id's that are being deleted.
  3. Foreach id in list
    • Delete from table1 where ID matches
    • Delete from table2 where ID matches
    • Delete from table3 where ID matches
    • Delete from table4 where ID matches
    • Delete from table5 where ID matches
  4. Re Add the foreign key restraints.

It seems that the time lockdown is doing a foreach loop and deleting from each table. In postgres is there a way to delete from the tables using parallelism? Or should I try deleting with groups of id's instead of single id's?

I am also creating an index and clustering on the ID before the delete statements are ran.

I am open to any suggestions.

Derked
  • 934
  • 10
  • 17
  • I don't understand why you're removing the foreign keys when you actually want to apply them; were they slow? I can't see why they should be slower than performing the same task manually – Dave Jun 03 '14 at 21:15
  • and just to confirm: are *all* the relevant columns indexed? (all ID columns and the columns which reference them) – Dave Jun 03 '14 at 21:17
  • All of the relevant columns are indexed. When I was deleting without removing the foreign keys the query execution took about 3-5 minutes. However by running them manually I was able to run them at 3-7 seconds. Not sure exactly what the difference is. I am able to do the same queries in SQL server at about 0.07 because the cascade delete seemed to be more effective. – Derked Jun 03 '14 at 21:28
  • It sounds to me as if your Postgres tables have something amiss in their setup, because it shouldn't be so slow. Can you post the create table script for the main table and one of the sub-tables? (http://stackoverflow.com/a/2594564/1180785) (also it might be worth trying this with just one sub-table, then two, etc. to narrow down the issue) – Dave Jun 03 '14 at 21:34
  • Also are you deleting one item at a time? Try to use a batch or a single statement if possible. I don't know which language you're driving it from, but most have the capability to batch statements, often converting them to more efficient single statements in the background. – Dave Jun 03 '14 at 21:38
  • Found the problem! :) When I ran the test before my indexes and cluster weren't being set up at the right time. Now I'm able to get an average runtime of 0.03 seconds :) – Derked Jun 03 '14 at 21:41
  • cool. If you think a particular debugging step helped you, I'd encourage you to post the details as an answer (you can accept your own answer after a day I believe) in case somebody has a similar problem in the future. – Dave Jun 03 '14 at 21:43

1 Answers1

1

Looks like I over complicated the query a little bit. I was able to solve it by the following steps.

  1. Creating an index for each column that I know will be queried in each table.
  2. Before running the query, running a cluster on the index improved time tremendously.

I am new to postgres and the clustered index is similar but different than what you are used to with SQL Server. See Cluster documentation for more information.

Derked
  • 934
  • 10
  • 17