7

The Story

I'm going to write up some code to manage the deleted items in my application, but I'm going to soft delete them so I could return them back when I need. I have a hierarchy to respect in my application's logic when it comes to hiding or deleting items.

I logically place my items in three containers to the country, city, district and brand. Each item should belong to a country, a city, a district and a brand. Now, if I deleted a country it should delete the cities, districts, brands, and items that belongs to the given country. and if I deleted the city it should also delete the whole stuff under it (districts, brands, etc)


A Note

When I delete a country and delete the associated brands, I should take care that a brand might have items in more than one country.


The Question

Do you suggest to

  1. Flag the items (whether it's country, city, item, etc) as deleted and this will require a lot of code to check every time when any item is loaded from the database, if it's deleted or not and also some extra fields to mark if the city it belongs to is deleted, and the country it belongs to is deleted and so on.

  2. Move the deleted stuff each to a specific table (DeletedCountries, Deleted Cities, etc) and save the the IDs of the items it was associated with so I could insert them back later to it's original table. and of course this will save my application all the code that will manage to check all the deleted items and make sure all the hierarchy is deleted.

  3. Maybe you have a better approach/advice/idea about achieving such a thing!

Mazen Elkashef
  • 3,430
  • 6
  • 44
  • 72
  • 2
    Because of the hierarchy issue, I'd actually opt for the less-optimal #2. It'll save you coding grief with the permanent `where isdeleted<>1` annoyance. (which probably will speed up sql server a tiny bit) – Earlz Mar 09 '11 at 04:01

7 Answers7

4

For argument's sake, one advantage of solution #2 (moving deleted items to their own tables) is if you have lots and lots of records, you would not have to worry about indexing records in respect to their "deleted" state.

With that said, if I were going to "move" data from table to table (via delete followed by insert) I would make sure to do it in 1 transaction.

dana
  • 17,267
  • 6
  • 64
  • 88
  • so dana you think I should go with option #2, but your advice is to do the insert and the delete in one transaction, you're saying to have a stored procedures that have both the insert and the delete statement ? – Mazen Elkashef Mar 09 '11 at 04:08
  • Indeed. The reason I say this is because if your code does an `insert` without the corresponding `delete` (or vice versa) your database will be in an inconsistent state. Using transactions, you can commit or rollback a series of sql statements. The last stored procedure on this page (http://www.4guysfromrolla.com/webtech/041906-1.shtml) has an example of this using try / catch. – dana Mar 09 '11 at 04:21
  • How referential integrity is handled in such case for elements belonging to a deleted brand ? Even when DELETE and INSERT are commited in a single transaction, FOREIGN KEY references could not be present so data would be vulnerable for inconsistencies. – too Mar 09 '11 at 04:49
4

I'm using a technique right now where we are storing a 'DeleteDate' on every user maintained table in our database. The DeleteDate field is a smalldatetime data type with a default value of 6/1/2079

Coupled with an index on the DeleteDate field, we are able to use a standard View or User-Defined-Function to return only the 'current' records (that is, those records with a delete date in the future). All queries route through this index when looking for current data, and deletes become a trivial update query.

There are some additional logic checks that need to be done for related tables. But that is part of the price of having to never worry about a user 'accidentally' deleting valuable data.

In the future, when these tables are excessively large and there are a lot of deleted records present, we can partition the table first on the DeleteDate. This will move all 'deleted' records away from the 'live' records.

Jeff Fritz
  • 9,821
  • 7
  • 42
  • 52
  • +1 For pointing out how to deal with really-deleted items. I use a similar approach in a current project. –  Mar 09 '11 at 04:55
  • I'm interested in more information on this solution, maybe a reference to a specific example in a blog post or something. I would like to read more about how this solution works over time, how to "route [all queries] through this index", how it interacts with ORMs that want to query and update the same object as opposed to views for queries and tables for updates. – flipdoubt Apr 27 '16 at 17:49
1

Flagging an item as delete really complicates the information retrieval, and also, you need to deal with cascade remove by yourself.

I would choose the "mail box" approach, that move deleted records to different table. I have done a project that use soft-delete, and I end up put all delete calls to Stored Procedure and handle the copy and remove in Stored Procedure.

xandy
  • 27,357
  • 8
  • 59
  • 64
1

You should manage your hierarchy by tagging all subitems as deleted. This way if your eg. product belongs to a brand, you can check only if brand is deleted. You should also put your logic on data retrieval side, to avoid unnecessary gathering of deleted information.

SELECT
  *
FROM
  products p,
  category c
WHERE
  p.catId = c.Id
  AND NOT c.Deleted

And above all, information about deleted category should be indexed.

CREATE PRIMARY INDEX ON category (Id)
CREATE INDEX ON category (Deleted)

or

CREATE INDEX ON category (Id, Deleted)
too
  • 3,009
  • 4
  • 37
  • 51
  • but this will require me to build a whole new DAL for the administration page, that will return all the data (regardless it's deleted or not) – Mazen Elkashef Mar 09 '11 at 04:13
  • You should design your DAL with handling of deleted column in mind. If you do, the difference is only a boolean flag passed which determines if you want to show deleted elements or not. – too Mar 09 '11 at 04:20
  • concerning your edit, the Id is already a primary key which is by default an Index .. so I should index the deleted only! .. I'm sorry it's off this question's topic but you got me confused about this right now! – Mazen Elkashef Mar 09 '11 at 04:21
  • You're right, I've just modified the example, the code is just to illustrate the idea, it's not SQL Server specific. – too Mar 09 '11 at 04:24
  • I meant that the Id column in my database is already a primary key, doesn't it mean that it's already using a cluster index/do I have to add an index to it ?? .. and btw thanks for the index tip +1 – Mazen Elkashef Mar 09 '11 at 04:29
  • If the column Id already have an index, don't add it - CREATE PRIMARY INDEX/KEY is just to illustrate the need for it when you don't create multiple column index. The latter is preferable when you check for delete everywhere in application. But apparently in administration you don't so having separate indexes for Id and Deleted will be better. – too Mar 09 '11 at 04:38
0

I think to flag the item is the best approach and even i also use the mail approach for the purpose of soft delete.

Yea that requires much stuff to take in account to manage but yet i didn't found any other way. I just add the one extra column to each and every table that is Status whose datatype is bit.

Thanks

Bhavik Goyal
  • 2,786
  • 6
  • 23
  • 42
  • But doesn't it affect the application's performance to check the item before it's displayed "every time" whether it's deleted or not! and also to manage the hierarchical structure that I have, it won't stop at just a single column, I'll have to add to the objects that are less in the hierarchy an extra column to check if it's parent is deleted so It should also be deleted! .. could you let me what do you think about my second approach :) ? and thanks a lot for your time! – Mazen Elkashef Mar 09 '11 at 03:54
  • Ya it will affect some performance, but you can manage the performance depending upon your application strength. But i think if you will add the record in other table to make the entry delete then it will create more overhead then this. – Bhavik Goyal Mar 09 '11 at 03:58
  • I'm sorry, I don't understand how come having a separate table will be an overhead! .. I'll delete an item, it's all moved to another table and I don't have to check the items when I display them! .. I'll pay the cost in performance when moving a lot of stuff but this will performance will appear to me (the administrator) and it's okay for the owner to face some delays, in order to let my visitors have a better experience (a faster application) .. Am I missing something or thinking the wrong way ? – Mazen Elkashef Mar 09 '11 at 04:03
0

How complex a delete technique are you asking for?

With just one date field and no audit log, you can have an instant deleted flag. If datefield is null, then it's not been deleted. You can then use that datefield on the index (if the index allows nulls).

If you want something more complex, then you could use extra tables. Will you allow it to be deleted, undeleted, redeleted and maintain a record of each of those? If so, keep a separate table for action logging and keep only the one record with a boolean field (actually a join on that table might be faster, depends on the data)

jcolebrand
  • 15,889
  • 12
  • 75
  • 121
0

If you often reconsistute the items, flagging is a preferable means, but you end up having to alter your data access to avoid showing the items that are flagged, which can be rather painful if you have already set up a lot of code accessing your data, so moving may be better if you have a lot of "legacy" code accessing the data. If it is rare, and you are also interested in a history log, moving to another database table works well.

One easy way to achieve either is to use a trigger that changes the delete row and does the operation. If you actually do need to delete items, however, the flag option becomes a royal PITA when you flag rather than move items. The reason a trigger is easier in many cases is you capture every delete, not just those that are initiated by code.

Gregory A Beamer
  • 16,870
  • 3
  • 25
  • 32