1

I need some wisdom shed on the controversy over soft-deletes vs hard-deletes in a relational-database-centric application. Commentaries like this, this, and this all discourage soft-deletes as basically a hack, a dirty, easy way out, and claim that hard-deletes are the better way.

I'd love to agree. But nobody seems to mention how to deal with one major, inescapable problem: how can one hard-delete a row when a foreign key constraint prevents its deletion, and when cascade-deleting is not an option?

Essentially, if you want to hard-delete entity A, and entity B is pointing to it, and for business reasons B simply cannot be removed, doesn't that force you to use some form of soft-delete (i.e. keep the record right where it is)?

If you have a customer "Acme" who has made transaction 10001, you can't simply just hard-delete Acme because the virtually any business would require that:

  1. No transaction can be deleted
  2. A transaction must point to the customer that made it

Regardless of how Acme got "deleted" (hard, soft, over-medium) a sales report, for instance, would still have to show that Transaction #10001 was made on 2010-05-01 by customer Acme--even if Acme is no longer around. So it requires that Acme be still stored at least somewhere in the system.

One thread keeps coming up from the advocates of hard-delete: use a proper audit table, and do things like keep a serialized copy of the deleted object in there. But we can't redirect the FK of Transaction #10001 to this particular row in the audit table. What then? Create an "OldCustomers" table that mirrors the schema of "Customers" and somehow redirect every FK in the system that pointed to Acme there? I'm no DB expert so I don't know if that's even possible, but even if it were, now all your reporting queries have to factor in two tables (sounds even worse than having to append AND IsDeleted = false to all queries)

In short, I feel like I'm missing something because hard- and soft-deletes are debated--implying that they are both feasible options. But as I understand it, in 95% of business cases, there is no possibility for using hard-delete (on a technical level, due FK constraints, and on a logical level, simply because a thing can't just disappear when another thing points to it)

Community
  • 1
  • 1
BCA
  • 7,776
  • 3
  • 38
  • 53
  • 1
    you're basically asking for opinions. there's no right/wrong way. there's only what meets your business requirements. anyone who says "use X, because it's the only proper way" is full of warmed-leavings-of-male-cow" – Marc B Oct 20 '16 at 21:51
  • 1
    @MarcB No, I'm not asking for opinions, rather, I'm asking how it is even possible (both in a technical sense and a logical sense) how hard-deletes can work when there are constraints involved. – BCA Oct 21 '16 at 00:40
  • Re "if you want to hard-delete entity A, and entity B is pointing to it, and for business reasons B simply cannot be removed": Don't you mean, A simply can't be removed? – philipxy Oct 21 '16 at 04:23
  • @philipxy: no, I mean B can't be removed via some cascade delete. A (a customer) is to be removed, but B (that customer's transaction) cannot be removed for obvious business reasons. If the customer record were simply to vanish, even if we had a nullable FK column in the Transactions table, is it acceptable that that particular transaction should just have NULL for it's FK value? When you query that row, it basically indicates "there is no customer associated with this transaction"? That's also unacceptable from a business perspective – BCA Oct 21 '16 at 15:20

3 Answers3

1

I think one way to get the hard delete to work and at the same time keep some kind of record is to not use FK for the relation, but just an identifier field that works like FK but does not quarantee referential integrity. Then you just need to handle the case when the user is no longer found with that id.

Ville Miekk-oja
  • 18,749
  • 32
  • 70
  • 106
  • Yeah, I seem to be leaning towards your approach. It seems to be the way things are going these days, with NoSQL document DBs where there are no such things as FKs (if I'm not mistaken) – BCA Mar 22 '21 at 15:07
0

TL;DR Appropriate tables & constraints arise as part of proper modeling. Your question incorrectly assumes limitations on what the tables & constraints are going to look like.


From a comment of yours:

is it acceptable that that particular transaction should just have NULL for it's FK value? When you query that row, it basically indicates "there is no customer associated with this transaction"

Well, there is no current customer associated with this transaction. If there is a past customer that is associated with this transaction that you want recorded then you have to record it. Ditto for any other data about the customer or the transaction. Proposals to hard delete assume that you keep desired historical data. Do not limit your thinking about achieving this to merely nulling FKs, cascading, adding a flag/date column to an extant table or anything else. Properly model both present & past including database changes that need to occur as a DBMS transaction upon each chosen application situation change. Proposals to soft-delete just involve putting certain current and historical data into the same table vs different ones. This only works for very simple models of current & historical situations.

It is usually straightforward to design a database for only a current application situation. But if we do care about the past we typically only care about some of it. If so, upon certain application situation changes from current to past we can copy a snapshot of the relevant current state into historical state. Labeling data with soft-delete flags vs dates is the combined-table version of undated vs dated historical data, where we only care about current vs past situations and we only care that vs when a change occurred.

"Temporal" databases more or less record the current situation and a bunch of dated once-current situations. This recording of past data using the structure for current data simplifies understanding & querying of current & past data. (The querying about intervals of time that a temporal database can facilitate can get quite complicated.) But it turns out that making a temporal version of a given current-data design does not just involve adding date columns to extant current-data tables. It requires remodeling current data, breaking it into smaller tables with more constraints. This is because different kinds of application situation changes require dating different column combinations of the extant current-data design. (Hard and soft historical snapshot designs must address this, but for a limited past/history.)

The idea behind "no soft deletes" is that it is safer to have two distinct contexts each restricting the mistakes that users and implementers can make. The idea behind "historical" vs "temporal" is that it can be way simpler to model and query a limited predetermined scope of past application situations. Yet the designs for each approach option arise naturally when you apply sound design principles in modeling whatever amount of data about your application's current and past situations you decide you want.

PS If you want to understand temporal databases, read Lorentzos, Date & Darwen and avoid Snodgrass.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • First of all, I agree with you that ideally I want the transaction to somehow indicate that Acme *was* the customer, not *is*, if it were hard-deleted. But are you saying that this is not possible without venturing into the realm of temporal databases? It looks like a relatively new technology (just introduced to SQL Server in v2016), and I was hoping for a more time-tested approach. Also, it seems like complexity overkill; in this hypothetical use case I don't need to insert new records upon each change to Acme's profile, but rather a very simple either-active-in-the-system-or-not – BCA Oct 22 '16 at 15:41
  • Re "But are you saying"? No, I said more than once that you need to *make a design of the current and past data you want*, including identifying how to move data from the current to the past appropriately on changes. I also said temporal was typically overkill. I also said only *very simple* cases can get away with some bit(s) being "active or not". I'll add a customer-transaction example when I can but you can try it right now. Make a design of the current and past data you want. Then you will see what the hard version of your (possibly) very simple soft case looks like. – philipxy Oct 22 '16 at 22:30
  • We are probably misunderstanding each other. There is no reason for the design of *past* and *current* customer records to be any different. If a customer is a past customer, they can still keep the same data in the address columns, for example. – BCA Oct 24 '16 at 15:42
  • Your arguments for "there is no possibility for using hard-delete" are based on wrong notions. Model current & historical state *that you care about*; then tell the DBMS that certain states *in that design* can't arise by declaring appropriate FKs. Try it. See the TL;DR. Good luck. – philipxy Oct 24 '16 at 19:45
  • I am not grasping what you are trying to describe. Can you point to any article or page that demonstrates this concept? – BCA Oct 26 '16 at 00:42
  • I'll give examples when I can, but seriously, *try some examples* of soft vs hard delete of data of interest. Table meanings and possible situations determine the FKs, not vice versa. You claim there's "no possibility" of saving whatever you want about the present & past? Have you *tried*? Be sure to give each table's *predicate*: a statement template parameterized by column names that a row in a table has made into a true statement, and a row not in a table has made into a false statement. PS Search/google re "historical data". Eg http://stackoverflow.com/q/3874199/3404097. – philipxy Oct 26 '16 at 02:00
0

I use a temporal design which make extensive use of soft deletes. Or get the current data by executing the same query "as of" now. The design allows for hard deletes (rarely used), soft deletes (most often used) and firm deletes (like soft deletes but not allowed to undo). This design allows one to execute a query "as of" a date and return the same result that would have been returned if the query had been executed on that date.

Needless to say, this extremely difficult if "old" data and "current" data are contained in different tables.

Looking over your references, I noticed there was a large concern for performance, yet there were no actual examples of performance hits and the first reference even admitted that "[i]n practice, filtering inactive rows doesn’t cost too much in itself."

This matches my testing of "versioned" tables with millions of versions of 100s of thousands of entities.

I get around the complexity issue by supplying views which exposes only current data and others to expose the entire history (and other views as needed). This is no problem for me as I tend to have apps access views instead of tables anyway.

The general complaint, when you get right down to it, is that using soft deletes makes the db developers' jobs more difficult. This is true. But our job is to make the users' jobs easier -- not our own.

My daughter is a financial analyst at a large bank. She often tells me how much easier her job would be if she could look back in time to see what the data looked back at some particular date in the past. There are archives, of course. But typically this only shows the final state of the data when it was archived. She can't run through the progression as accounts are opened, go through changes and finally get closed. (These are large corporate retirement accounts, not regular checking or savings accounts.)

But she can't get her IT dept interested because it would "be too much work." Sad.

TommCatt
  • 5,498
  • 1
  • 13
  • 20