Ordered DELETE of records in self-referencing table

Question

I need to delete a subset of records from a self referencing table. The subset will always be self contained (that is, records will only have references to other records in the subset being deleted, not to any records that will still exist when the statement is complete).

My understanding is that this might cause an error if one of the records is deleted before the record referencing it is deleted.

First question: does postgres do this operation one-record-at-a-time, or as a whole transaction? Maybe I don't have to worry about this problem?

Second question: is the order of deletion of records consistent or predictable?

I am obviously able to write specific SQL to delete these records without any errors, but my ultimate goal is to write a regression test to show the next person after me why I wrote it that way. I want to set up the test data in such a way that a simplistic delete statement will consistently fail because of the records referencing the same table. That way if someone else messes with the SQL later, they'll get notified by the test suite that I wrote it that way for a reason.

Anyone have any insight?

EDIT: just to clarify, I'm not trying to work out how to delete the records safely (that's simple enough). I'm trying to figure out what set of circumstances will cause such a DELETE statement to consistently fail.

EDIT 2: Abbreviated answer for future readers: this is not a problem. By default, postgres checks the constraints at the end of each statement (not per-record, not per-transaction). Confirmed in the docs here: http://www.postgresql.org/docs/current/static/sql-set-constraints.html And by the SQLFiddle here: http://sqlfiddle.com/#!15/11b8d/1

I guess you would have to do a few insert and then an update, so that, say, record number one refers to another record that has been inserted later. At least, I would do it this way in MySQL. — mika, Jun 06 '14 at 05:42
I tried inserting the rows in different orders, and also updating them with the offending keys after the insertions were done (just in case the update statement somehow affected the delete order). No dice, the DELETE statement always succeeded (which I found pretty strange). — Nick W., Jun 06 '14 at 06:28
@mika: MySQL handles constraints differently than Postgres. It evaluates the constraints during deletion on a row-by-row basis. Postgres (and basically all other DBMS) evaluate the constraint at the end of the statement. — , Jun 06 '14 at 08:28
@a_horse_with_no_name Good to know. That sounds a good reason to start working with Postgres :) Thanks. — mika, Jun 06 '14 at 22:32

score 4 · Answer 1 · answered Jun 06 '14 at 08:29

4

In standard SQL, and I believe PostgreSQL follows this, each statement should be processed "as if" all changes occur at the same time, in parallel.

So the following code works:

CREATE TABLE T (ID1 int not null primary key,ID2 int not null references T(ID1));
INSERT INTO T(ID1,ID2) VALUES (1,2),(2,1),(3,3);
DELETE FROM T WHERE ID2 in (1,2);

Where we've got circular references involved in both the INSERT and the DELETE, and yet it works just fine.

fiddle

answered Jun 06 '14 at 08:29

Damien_The_Unbeliever

234,701
27
340
448

This answer made the one I accepted much clearer to me, thanks heaps for the SQLFiddle, but the other one actually was correct, and was posted first, so I had to award the point. Much appreciated. – Nick W. Jun 06 '14 at 10:32

score 3 · Accepted Answer · answered Jun 06 '14 at 06:18

A single DELETE with a WHERE clause matching a set of records will delete those records in an implementation-defined order. This order may change based on query planner decisions, statistics, etc. No ordering guarantees are made. Just like SELECT without ORDER BY. The DELETE executes in its own transaction if not wrapped in an explicit transaction, so it'll succeed or fail as a unit.

To force order of deletion in PostgreSQL you must do one DELETE per record. You can wrap them in an explicit transaction to reduce the overhead of doing this and to make sure they all happen or none happen.

PostgreSQL can check foreign keys at three different points:

The default, NOT DEFERRABLE: checks for each row as the row is inserted/updated/deleted
DEFERRABLE INITIALLY IMMEDIATE: Same, but affected by SET CONSTRAINTS DEFERRED to instead check at end of transaction / SET CONSTRAINTS IMMEDIATE
DEFERRABLE INITIALLY DEFERRED: checks all rows at the end of the transaction

In your case, I'd define your FOREIGN KEY constraint as DEFERRABLE INITIALLY IMMEDIATE, and do a SET CONSTRAINTS DEFERRED before deleting.

(Actually if I vaguely recall correctly, despite the name IMMEDIATE, DEFERRABLE INITIALLY IMMEDIATE actually runs the check at the end of the statement instead of the default of after each row change. So if you delete the whole set in a single DELETE the checks will then succeed. I'll need to double check).

(The mildly insane meaning of DEFERRABLE is IIRC defined by the SQL standard, along with gems like a TIMESTAMP WITH TIME ZONE that doesn't have a time zone).

For a DELETE on a self-referencing table there is no need for a deferrable constraint as the FK constraint will only be checked at the end of the statement, not for each row. http://sqlfiddle.com/#!15/8d424/1 The deferrable constraint would only be needed when running a multi-row **UPDATE** that changes a PK value to one that already exists (e.g. when renumbering the PK) — , Jun 06 '14 at 08:30
Indeed, the docs do say that. Just needed a nudge in the right direction, thanks everyone! — Nick W., Jun 06 '14 at 10:31
Huh, I never spotted that. @NickW., probably better to accept Damien's answer instead of mine. — Craig Ringer, Jun 06 '14 at 13:09
@NickW.: We discussed `DEFERRABLE` constraints under this related question: http://stackoverflow.com/questions/10032272/constraint-defined-deferrable-initially-immediate-is-still-deferred — Erwin Brandstetter, Jun 08 '14 at 02:08

score 0 · Answer 3 · edited Jun 06 '14 at 08:20

1) It will do as transaction if enclosed within "BEGIN/COMMIT". Otherwise in general no.

For more see http://www.postgresql.org/docs/current/static/tutorial-transactions.html

The answer in general to your question depends on how is self-referencing implemented. If it is within application logic, it is solely your responsibility to check the things yourself.

Otherwise, it is in general possible to restrict or cascade deletes for rows with foreign keys and DELETE CASCADE . However, as far as PG docs go, I understand we are talking about referencing columns in other tables, not sure if same-table foreign keys are supported:

http://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK

2) In general, the order of deletion will be the order in which you issue delete statements. If you want them all to be "uninterruptible" with no other statements modifying table in between, you enclose them in a transaction.

As a warning, I may be wrong, but what you seem to be trying to do, must not be done. You should not have to rely on some esoteric "order of deletion" or some other undocumented and/or implicit features of database. The underlying logic does not seem sound, there should be another way.

If there is a single delete that affects multiple records, then it is practically "inside a transaction" and either all will succeed or all will fail. Same-table foreign keys should work without problems with cascade. — Sami Kuhmonen, Jun 06 '14 at 06:01

score 0 · Answer 4 · answered Jun 06 '14 at 06:04

If you issue a single DELETE that affects multiple records (like delete from x where id>100), that will be handled as a single transaction and either all will succeed or fail. If multiple DELETEs, you have to put them in a transaction yourself.

There will be problems. If you have a constraint with DELETE CASCADE, you might delete more than you want with a single DELETE. If you don't, the integrity check might stop you from deleting. Constraints other than NO ACTION are not deferrable, so you'd have to disable the constraint before delete and enable it afterwards (basically drop/create, which might be slow).

If you have multiple DELETEs, then the order is as the DELETE statements are sent. If a single DELETE, the database will delete in the order it happens to find them (index, oids, something else...).

So I would also suggest thinking about the logic and maybe handling the deletes differently. Can you elaborate more on the actual logic? A tree in database?

Ordered DELETE of records in self-referencing table

4 Answers4