31

The lead developer on a project I'm involved in says it's bad practice to rely on cascades to delete related rows.

I don't see how this is bad, but I would like to know your thoughts on if/why it is.

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
Benbob
  • 13,876
  • 18
  • 79
  • 114
  • Possible duplicate of [When/Why to use Cascading in SQL Server?](http://stackoverflow.com/questions/59297/when-why-to-use-cascading-in-sql-server). Also, I tend to agree a lot more with the answers of the other question, which are quite different than the ones here. – Eyal Roth Apr 09 '17 at 17:57

7 Answers7

27

I'll preface this by saying that I rarely delete rows period. Generally most data you want to keep. You simply mark it as deleted so it won't be shown to users (ie to them it appears deleted). Of course it depends on the data and for some things (eg shopping cart contents) actually deleting the records when the user empties his or her cart is fine.

I can only assume that the issue here is you may unintentionally delete records you don't actually want to delete. Referential integrity should prevent this however. So I can't really see a reason against this other than the case for being explicit.

cletus
  • 616,129
  • 168
  • 910
  • 942
  • I have to agree with cletus here, unless there's a legal or real capacity issue...storage is so cheap and fast there's rarely a real reason to do a hard delete these days. +1 – Nick Craver Feb 05 '10 at 03:17
  • 15
    Here's one reason - deleted data creeping back into query results because someone forgot an `AND DELETED = 0`. If you do this, hide all deleted data behind a live-data-only view or similar, or you **will** run into problems. – Michael Petrotta Feb 05 '10 at 03:20
  • @Michael - Depends on your architecture, for example I have a linq project, and in the T4 templates it's trivial to add a `.Current()` extension method call in the `DataContext.GetTable` references it generates. I could see how this would be a problem in other situations, but not *always*. – Nick Craver Feb 05 '10 at 03:24
  • 1
    @Nick: True. But, you never touch your data with Query Analyzer, not even to to DBA work? No external reporting tools, Excel spreadsheets, etc.? – Michael Petrotta Feb 05 '10 at 03:34
  • 1
    Thats what views are for. Name the table `foo_historical` and the view `foo` if you have to. – Joe Koberg Feb 05 '10 at 03:36
  • @Michael - We do use external query tools, but typically LinqPad (so non-issue) for quick queries or Toad. I agree it could definitely lead to mistakes there. However, specific to my current project: cases when there people are looking at data that matters, they're going against a warehouse that doesn't import deleted rows. I guess a bit of a special situation there, so it could definitely be much worse in that area. – Nick Craver Feb 05 '10 at 03:43
  • 1
    I have to completely disagree with your preface - deleting data is fairly common in my experience. In fact, the opposite is quite rate. This seems to be a very controversial topic though. Take a look at [this question](http://stackoverflow.com/questions/2549839); it was marked as a duplicate of [this question](http://stackoverflow.com/questions/378331), but is far more popular and recent. – Eyal Roth Apr 09 '17 at 18:08
  • Thumbs up for soft deleted with views. At scale this solution becomes even more evident as the clear winner. – pim Apr 17 '18 at 14:57
  • 1
    When bad data is uploaded and you need to replace it you must DELETE it :) I like Cascade. I'm looking for information on how it works to justify to others why we should implement. It seems very efficient to me, but there are concerns about memory allocation.... – DonkeyKong Sep 03 '20 at 20:13
  • 1
    I actually use `ON DELETE CASCADE` _because_ I also do soft deletes. If I delete a row it's because I truly want all that info gone, such as GDPR requirements . – Mike Graf Apr 25 '23 at 17:53
25

I would say that you follow the principle of least surprise.

Cascading deletes should not cause unexpected loss of data. If a delete requires related records to be deleted, and the user needs to know that those records are going to go away, then cascading deletes should not be used. Instead, the user should be required to explicitly delete the related records, or be provided a notification.

On the other hand, if the table relates to another table that is temporary in nature, or that contains records that will never be needed once the parent entity is gone, then cascading deletes may be OK.

That said, I prefer to state my intentions explicitly by deleting the related records in code, rather than relying on cascading deletes. In fact, I've never actually used a cascading delete to implicitly delete related records. Also, I use soft deletion a lot, as described by cletus.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • 2
    That's a good principle, there are enough naturally occurring surprises - no need to manufacture more of them. – Rawheiser Sep 24 '14 at 18:48
  • 2
    Totally agree on this one other than the "marked solution", soft delete is not a golden. It can easily become a big headache when you have other unique constrain on the table other than PK. only do soft delete when you required to do so. – LeY Jul 23 '19 at 18:20
  • i like your considerations. i've, come to the conclusion that when used, a comment about it in the code will do. it beats orchestrating the deletion – symbiont Feb 18 '21 at 16:18
18

I never use cascading deletes. Why? Because it is too easy to make a mistake. Much safer to require client applications to explicitly delete (and meet the conditions for deletion, such as deleting FK referred records.)

In fact, deletions per se can be avoided by marking records as deleted or moving into archival/history tables.

In the case of marking records as deleted, it depends on the relative proportion of marked as deleted data, since SELECTs will have to filter on 'isDeleted = false' an index will only be used if less than 10% (approximately, depending on the RDBMS) of records are marked as deleted.

Which of these 2 scenarios would you prefer:

  1. Developer comes to you, says "Hey, this delete won't work". You both look into it and find that he was accidently trying to delete entire table contents. You both have a laugh, and go back to what you were doing.

  2. Developer comes to you, and sheepishly asks "Do we have backups?"

There's another great reason to not use cascading UPDATES or DELETES: they hold a serializable lock. Holding a serializable lock can kill performance.

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • 1
    Let's say I'm backing an order out of the system: Order -> Line Items -> Comments. If I delete the order (confirmation prompt in the app, naturally), the rows in the other 2 tables are now useless. If you aren't setup to recover the parent record(n which case, you aren't deleting...) often the children no longer matter, easier to cascade delete them all together in one database hit from the start. Like I said, matter of application and opinion, but they **do** have their uses. – Nick Craver Feb 05 '10 at 03:38
  • 3
    @NickCraver, if you have foreign keys properly set up (And you are incompetent to design a database if you do not) then the scenario you describe cannot happen. If you are so foolish as to rely on an application for relationships,chances of database integrity problems approach 100%. – HLGEM Apr 17 '18 at 19:01
  • @HLGEM what is the "proper" way though - isn't that setting on delete cascade for these? Another table type where I often have those questions is on "many-to-many" relation tables, which combine entries from table A to B. If A (or B) gets deleted what should happen to this table? – paul23 Mar 24 '21 at 10:41
  • @Nick Craver: does the main stackoverflow database use cascading deletes? – Mitch Wheat Mar 30 '21 at 07:32
12

Another huge reason to avoid cascading deletes is performance. They seem like a good idea until you need to delete 10,000 records from the main table which in turn have millions of records in child tables. Given the size of this delete, it is likely to completely lock down all of the table for hours maybe even days. Why would you ever risk this? For the convenience of spending ten minutes less time writing the extra delete statements for one record deletes?

Further, the error you get when you try to delete a record that has a child record is often a good thing. It tells you that you don't want to delete this record becasue there is data that you need that you would lose if you did so. Cascade delete would just go ahead and delete the child records resulting in loss of information about orders for instance if you deleted a customer who had orders in the past. This sort of thing can thoroughly mess up your financial records.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 1
    I don't understand your first scenario. How would the alternative work that would avoid the perf overhead? Would you simply not delete the child records? Or is it the fact that a delete somehow locks the tables that causes the problem, and separate deletes avoids this? – Merlyn Morgan-Graham Jan 25 '12 at 21:52
  • 1
    You would do the deletes from the child tables in batches and then the main delete. I would not allow anyone to ever use cascade delete in a production Enterprise type database. It is simply too likely to cause a production problem. – HLGEM Jan 25 '12 at 22:12
  • 1
    In case of many rows that needs to be deleted (in the referenced table), if anything, the performance of cascading deletes would be better; your code won't run faster than the one in the DB. The other scenario is strange to me - you can't "mistakenly" delete referencing rows. If you've declared a FK as cascading, it means that it is your intention of not allowing the referencing rows to exist without the referenced one, which is a completely acceptable use case. – Eyal Roth Apr 09 '17 at 18:16
  • @errr, too many people don't think through what they declared as cascading. SO they get rid of things where the FK cshould have stopped them from delting. I have had to deal with databases messed up this way. – HLGEM Apr 10 '17 at 13:28
  • 3
    @HLGEM this just means that this feature is being abused; it doesn't mean that it's a bad practice. – Eyal Roth Apr 12 '17 at 15:53
  • 4
    It is a bad practice for people learning SQL who don't know what they are doing to be given as a solution. – HLGEM Apr 12 '17 at 16:45
2

I was likewise told that cascading deletes were bad practice... and thus never used them until I came across a client who used them. I really didn't know why I was not supposed to use them but thought they were very convenient in not having to code out deleting all the FK records as well.

Thus I decided to research why they were so "bad" and from what I've found so far their doesn't to appear to be anything problematic about them. In fact the only good argument I've seen so far is what HLGLEM stated above about performance. But as I am usually not deleting this number of records I think in most cases using them should be fine. I would like to hear of any other arguments others may have against using them to make sure I've considered all options.

Anthony Griggs
  • 1,469
  • 2
  • 17
  • 39
  • 3
    The problem with cascade on delete is, if you ever delete a value from the referenced table accidentally, all rows that use that value will be deleted (not just the value in those rows but the whole row). Seems really stupid, but I just learned this the hard way today. – Holonaut Nov 22 '17 at 17:23
  • @Holonaut I see what your saying, that would be a problem. But I see the same problem arising if you accidently deleted a record programatically as well as I usually do with the click of the button. – Anthony Griggs Nov 22 '17 at 17:36
0

I'd add that ON DELETE CASCADE makes it difficult to maintain a copy of the data in a data warehouse using binlog replication which is how most commercial ETL tools work. Explicit deletion from each table maintains a full log record and is much easier on the data team :)

0

I actually agree with most of the answers here, YET not all scenarios are the same, and it depends on the situation at hand and what would be the entropy of that decision, for example:

If you have a deletion command for an entity that has multiple many/belong relationships with a large number of entities, each time you would call that deletion process you would also need to remember to delete all the corresponding FKs from each relational pivot that A has corrosponding relationships with.

Whereas via a cascade on delete, you write that once as part of your schema and it will ONLY delete those corresponding FKs and cleanup the pivots from relations that are no longer necessary, imagine 24 relations for an entity + other entities that would also have large number of relations on top of that, again, it really depends on your setup and what YOU feel comfortable with. In anycase just for FYIs, in an Illuminate migration schema file, you would write it as such:

            $table->dropForeign(['permission_id']);

            $table->foreign('permission_id')
                ->references('id')
                ->on('permission')
                ->onDelete('cascade');