3

If I have a simple User table in my database and a simple Item table with a User.id as a foreign key thus:

(id UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
name NVARCHAR (MAX) NULL,
email NVARCHAR (128) NULL,
authenticationId NVARCHAR (128) NULL,
createdAt DATETIME DEFAULT GETDATE() NOT NULL,
PRIMARY KEY (id))

CREATE TABLE Items
(id UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
userId UNIQUEIDENTIFIER NOT NULL,
name NVARCHAR (MAX) NULL,
description NVARCHAR (MAX) NULL,
isPublic BIT DEFAULT 0 NOT NULL,
createdAt DATETIME DEFAULT GETDATE() NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (userId) REFERENCES Users (id))

If a user is removed from the table I need all of the related items to be removed first to avoid breaking referential integrity constraints. This is easily done with CASCADE DELETE

CREATE TABLE Items
(id UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
userId UNIQUEIDENTIFIER NOT NULL,
name NVARCHAR (MAX) NULL,
description NVARCHAR (MAX) NULL,
isPublic BIT DEFAULT 0 NOT NULL,
createdAt DATETIME DEFAULT GETDATE() NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (userId) REFERENCES Users (id) ON DELETE CASCADE)

But if I also have collections which reference users, and a table collecting items into collections I am in trouble, i.e. the following additional code does not work.

CREATE TABLE Collections
(id UNIQUEIDENTIFIER DEFAULT (NEWID()) NOT NULL,
userId UNIQUEIDENTIFIER NOT NULL,
name NVARCHAR (MAX) NULL,
description NVARCHAR (MAX) NULL,
isPublic BIT DEFAULT 0 NOT NULL,
layoutSettings NVARCHAR (MAX) NULL,
createdAt DATETIME DEFAULT GETDATE() NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (userId) REFERENCES Users (id) ON DELETE CASCADE)

CREATE TABLE CollectedItems
(itemId UNIQUEIDENTIFIER NOT NULL,
collectionId  UNIQUEIDENTIFIER NOT NULL,
createdAt DATETIME DEFAULT GETDATE() NOT NULL,
PRIMARY KEY CLUSTERED (itemId, collectionId),
FOREIGN KEY (itemId) REFERENCES Items (id) ON DELETE CASCADE,
FOREIGN KEY (collectionId) REFERENCES Collections (id) ON DELETE CASCADE)

The error indicates that this "may cause cycles or multiple cascade paths". The way around this I see recommended is to

  1. Redesign the tables, but I cannot see how; or, and often stated as "a last resort"
  2. Use triggers.

So I remove the ON DELETE CASCADE and instead use triggers (documentation) like this:

CREATE TRIGGER DELETE_User
   ON Users
   INSTEAD OF DELETE
AS 
BEGIN
 SET NOCOUNT ON
 DELETE FROM Items WHERE userId IN (SELECT id FROM DELETED)
 DELETE FROM Collections WHERE userId IN (SELECT id FROM DELETED)
 DELETE FROM Users WHERE id IN (SELECT id FROM DELETED)
END

CREATE TRIGGER DELETE_Item
   ON Items
   INSTEAD OF DELETE
AS 
BEGIN
 SET NOCOUNT ON
 DELETE FROM CollectedItems WHERE itemId IN (SELECT id FROM DELETED)
 DELETE FROM Items WHERE id IN (SELECT id FROM DELETED)
END

CREATE TRIGGER DELETE_Collection
   ON Collections
   INSTEAD OF DELETE
AS 
BEGIN
 SET NOCOUNT ON
 DELETE FROM CollectedItems WHERE collectionId IN (SELECT id FROM DELETED)
 DELETE FROM Collections WHERE id IN (SELECT id FROM DELETED)
END

However this fails, although subtly. I have a bunch of unit tests (written in xUnit). Individually the tests always pass. But run en masse some randomly fail with a SQL deadlock. In another answer I was pointed to the SQL Profiler which shows a deadlock between two delete calls.

What is the correct way to solve these diamond shaped delete cascades?

Community
  • 1
  • 1
dumbledad
  • 16,305
  • 23
  • 120
  • 273
  • Which RDBMS is this for? Please add a tag to specify whether you're using `mysql`, `postgresql`, `sql-server`, `oracle` or `db2` - or something else entirely. – marc_s Feb 12 '16 at 08:53
  • Since you say the tests only fail "en masse", i guess the deadlock is not occurring among your DELETE triggers, which seem fine to me. So maybe you need to look further than just the triggers. – Oliver Feb 12 '16 at 10:22
  • They are on the delete triggers (though it's hard to get that out of the profiler.) I'll edit the question and put in a screenshot and some more detail of the deadlock, though it's a but tricky as the sample I've given in the question is simplified from the real tables. – dumbledad Feb 12 '16 at 10:24
  • ...Can "other" users collect a different user's items? If so, I'm betting **that's** what's causing the problems. That you have conflicting deletes both going for the same thing. – Clockwork-Muse Feb 12 '16 at 11:03
  • No; but there are other tables allowing UserA to favourite some of UserB's collections. My unit tests do not exercise that (yet) so that's not the source of the deadlock. Keep those suggestions coming through, thanks – dumbledad Feb 12 '16 at 11:14
  • Actually, I'm headdesking a little bit. What your statement likely meant is that your xUnit tests **are stateful**. If at all possible, each individual test should be running against it's own copy of the (mock, test, usually in-memory only) database. But that doesn't solve the actual deadlocking issue. Which is probably because one test was deleting a user at the same time a different one was deleting a collection or something; xUnit runs tests concurrently (multiple threads), which will spawn multiple connections, each with their own transactions. – Clockwork-Muse Feb 12 '16 at 11:36
  • I've written the tests to not be stateful, and I believe each test is fired from a separate class instance by xUnit. There is a fair amount of context set-up in the constructor and clean-up in the `IDispose.Dispose` method, but as the ids on my tables are Guids the concurrently running tests shouldn't interfere with each other. Thanks again though, a good thought. – dumbledad Feb 12 '16 at 13:53

3 Answers3

1

Several ways of working come to mind:

  1. Don't delete the user, simply deactivate it. Add a BIT field active and set it to 0 for deactivated users. Simple, easy, fast, and maintains a log what users there were in your system and what their associated state is. Usually you are not supposed to delete such information about a user, you want to keep it for future reference.

  2. Don't rely on cascades and triggers, handle it yourself in code. Cascades and triggers can be hard to maintain and their behaviour hard to predict (cf the deadlock you experience).

  3. If you can't/don't want to do any of the above, consider deleting everything from the User delete trigger. First disable the delete triggers on referring tables, do all your deletes, then enable the delete triggers on referring tables.

TT.
  • 15,774
  • 6
  • 47
  • 88
  • 1
    ... you can still get deadlocks from code, especially as you're still running (pretty much) the exact same statement, just from a slower location. The _visibility_ is maybe better (that such a thing happens), but the actual prediction of ordering doesn't change. I _really_ dislike disabling triggers as part of a "standard, client-initiated action". For one thing, it would prevent concurrent calls. – Clockwork-Muse Feb 12 '16 at 10:19
  • Drat - understanding and fixing those deadlocks is what I'm after. – dumbledad Feb 12 '16 at 10:22
  • @Clockwork-Muse As I said in 2, triggers are such a mess... I only use those if there is absolutely no other way. It turns out that I never use them. I'd go for option 1, why would you ever want to delete a user if you can just deactivate them? – TT. Feb 12 '16 at 10:40
  • I agree, and the history of 'deleted' users would be interesting. But several team members, especially those with legal or privacy experience, want a user removing their account to delete all their data. So 1 is out for us. – dumbledad Feb 12 '16 at 11:18
1

Another thing to try is setting isolation level to SERIALIZABLE in your trigger when you delete a user/item/collection. Since you are possibly deleting many items/collections/collected items when deleting a user, having another transaction INSERT something during this run can cause problems. SERIALIZABLE solves that to some extent.

SQL-Server uses that isolation level on cascading deletes for exactly this reason: https://learn.microsoft.com/en-us/archive/blogs/conor_cunningham_msft/conor-vs-isolation-level-upgrade-on-updatedelete-cascading-ri

Pang
  • 9,564
  • 146
  • 81
  • 122
Oliver
  • 3,225
  • 1
  • 18
  • 12
  • Well, you can _still_ deadlock with `SERIALIZABLE` (in fact, it's easier given the way it has to escalate certain things). You have to be extra careful with the ordering of operations to prevent such references from occurring – Clockwork-Muse Feb 12 '16 at 11:22
1

I prefer to not have automatic cascade operations, being it DELETE or UPDATE. Just for the sake of peace of mind. Imagine you've configured your cascade deletes and then your program due to some bug tries to delete the wrong user, even though the database has some data related to it. All related data in related tables will be gone without any warning.

Normally I make sure that all related data is deleted first using explicit separate procedures, one for each related table, and then I delete the row in the master table. The delete will succeed because there are no child rows in referenced tables.

For your example I'd have a dedicated stored procedure DeleteUser with one parameter UserID, which knows what tables are related to the user and in what order the details should be deleted. This procedure is tested and is the only way to remove the user. If the rest of the program by mistake would try to directly delete a row from the Users table, this attempt would fail, if there is some data in the related tables. If the mistakenly deleted user didn't have any details, the attempt would go through, but at least you will not lose a lot of data.

For your schema the procedure may look like this:

CREATE PROCEDURE dbo.DeleteUser
    @ParamUserID int
AS
BEGIN
    SET NOCOUNT ON; SET XACT_ABORT ON;

    BEGIN TRANSACTION;
    BEGIN TRY
        -- Delete from CollectedItems going through Items
        DELETE FROM CollectedItems
        WHERE CollectedItems.itemId IN
        (
            SELECT Items.id
            FROM Items
            WHERE Items.userId = @ParamUserID
        );

        -- Delete from CollectedItems going through Collections
        DELETE FROM CollectedItems
        WHERE CollectedItems.collectionId IN
        (
            SELECT Collections.id
            FROM Collections
            WHERE Collections.userId = @ParamUserID
        );

        -- Delete Items
        DELETE FROM Items WHERE Items.userId = @ParamUserID;

        -- Delete Collections
        DELETE FROM Collections WHERE Collections.userId = @ParamUserID;

        -- Finally delete the main user
        DELETE FROM Users WHERE ID = @ParamUserID;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
        ...
        -- process the error
    END CATCH;
END

If you really want to set up cascade deletes, then I'd define one trigger, only for Users table. Again, there will be no foreign keys with a cascade delete, but the trigger on Users table would have the logic very similar to the procedure above.

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90