What is the best way to implement soft deletion?

Question

Working on a project at the moment and we have to implement soft deletion for the majority of users (user roles). We decided to add an is_deleted='0' field on each table in the database and set it to '1' if particular user roles hit a delete button on a specific record.

For future maintenance now, each SELECT query will need to ensure they do not include records where is_deleted='1'.

Is there a better solution for implementing soft deletion?

Update: I should also note that we have an Audit database that tracks changes (field, old value, new value, time, user, ip) to all tables/fields within the Application database.

Chris Cherry · Answer 1 · 2019-08-31T06:45:01.867

105

I would lean towards a deleted_at column that contains the datetime of when the deletion took place. Then you get a little bit of free metadata about the deletion. For your SELECT just get rows WHERE deleted_at IS NULL

edited Aug 31 '19 at 06:45

answered Sep 16 '08 at 00:50

Chris Cherry

28,118
6
68
71

32

this is how we do it -- deletion is always a date field – Jeff Atwood Sep 16 '08 at 01:11
1

Is it beneficial to index deleted_at in this scenario? – Nov 22 '13 at 08:16
2

I'd think that using a date field would be better than using a flag just because you can partition a date field in a variety of ways. – Sergey Akopov Sep 06 '14 at 03:30
1

Agree !!!! IMO, using datetime help us know more about time of deletion. Moreover, it make our code is readable and have some naming convention like: created_at, updated_at, deleted_at , created_by, updated_by, deleted_by. – Chung Jun 23 '15 at 05:33
1

@JeffAtwood and what happens when the row is "undeleted"? Don't (or didn't...) you store somewhere all this information? – gdoron Jul 25 '16 at 19:03

score 57 · Accepted Answer · answered Sep 16 '08 at 00:49

57

You could perform all of your queries against a view that contains the WHERE IS_DELETED='0' clause.

answered Sep 16 '08 at 00:49

David J. Sokol

3,456
3
31
25

12

don't forget to index that column ;) – Jiaaro Jan 04 '10 at 14:53
1

Consider partitioning by that key as well, that should have greater positive impact on performance than a simple index on a column that has low selectivity. – Chris Schaller Mar 03 '20 at 23:42

score 22 · Answer 3 · edited Apr 23 '19 at 08:57

Having is_deleted column is a reasonably good approach. If it is in Oracle, to further increase performance I'd recommend partitioning the table by creating a list partition on is_deleted column. Then deleted and non-deleted rows will physically be in different partitions, though for you it'll be transparent.

As a result, if you type a query like

SELECT * FROM table_name WHERE is_deleted = 1

then Oracle will perform the 'partition pruning' and only look into the appropriate partition. Internally a partition is a different table, but it is transparent for you as a user: you'll be able to select across the entire table no matter if it is partitioned or not. But Oracle will be able to query ONLY the partition it needs. For example, let's assume you have 1000 rows with is_deleted = 0 and 100000 rows with is_deleted = 1, and you partition the table on is_deleted. Now if you include condition

WHERE ... AND IS_DELETED=0

then Oracle will ONLY scan the partition with 1000 rows. If the table weren't partitioned, it would have to scan 101000 rows (both partitions).

-1 Why partition is not a table, or view? Was introducing one more concept really necessary; doesn't oracle already have documentation volumes rival Library of Congress? — Tegiri Nenashi, May 04 '11 at 16:41

score 16 · Answer 4 · answered Sep 16 '08 at 12:47

The best response, sadly, depends on what you're trying to accomplish with your soft deletions and the database you are implementing this within.

In SQL Server, the best solution would be to use a deleted_on/deleted_at column with a type of SMALLDATETIME or DATETIME (depending on the necessary granularity) and to make that column nullable. In SQL Server, the row header data contains a NULL bitmask for each of the columns in the table so it's marginally faster to perform an IS NULL or IS NOT NULL than it is to check the value stored in a column.

If you have a large volume of data, you will want to look into partitioning your data, either through the database itself or through two separate tables (e.g. Products and ProductHistory) or through an indexed view.

I typically avoid flag fields like is_deleted, is_archive, etc because they only carry one piece of meaning. A nullable deleted_at, archived_at field provides an additional level of meaning to yourself and to whoever inherits your application. And I avoid bitmask fields like the plague since they require an understanding of how the bitmask was built in order to grasp any meaning.

I agree, a "deleted_at" is much more preferable to a simple "is_deleted" as you get extra information for free. — BobbyShaftoe, Jan 02 '09 at 22:20

score 15 · Answer 5 · answered Sep 16 '08 at 00:49

15

if the table is large and performance is an issue, you can always move 'deleted' records to another table, which has additional info like time of deletion, who deleted the record, etc

that way you don't have to add another column to your primary table

answered Sep 16 '08 at 00:49

Jiaaro

74,485
42
169
190

3

I actually really like that idea because then my main table isn't cluttered with thousands of soft deleted records. How would you handle child and dependent records, though? Would you basically recreate the same table dependencies (such as for an order and order items)? – Arthur Chaparyan Feb 03 '09 at 09:06
2

if performance is an issue, you could partition the table by is_deleted or deleted_at or any to speed up queries. – sonam Aug 15 '13 at 01:44
The issue with moving deleted records to a new table, is if you have any other records which have foreign keys, this won't work. For instance, let's say you have a user record in one table, and user_address records in another table, with a foreign key for user_id in user_address. If you need to retain all of the information about the user, including their addresses, you would have to have "user_address_deleted" as well and "deleting the user" becomes a challenge. You have to copy to deleted table all records in both tables, and then delete the original records in both. – bpeikes Dec 02 '22 at 03:27

score 12 · Answer 6 · answered Sep 16 '08 at 00:57

That depends on what information you need and what workflows you want to support.

Do you want to be able to:

know what information was there (before it was deleted)?
know when it was deleted?
know who deleted it?
know in what capacity they were acting when they deleted it?
be able to un-delete the record?
be able to tell when it was un-deleted?
etc.

If the record was deleted and un-deleted four times, is it sufficient for you to know that it is currently in an un-deleted state, or do you want to be able to tell what happened in the interim (including any edits between successive deletions!)?

score 10 · Answer 7 · edited Jun 27 '23 at 01:55

10

Careful of soft-deleted records causing uniqueness constraint violations. If your DB has columns with unique constraints then be careful that the prior soft-deleted records don’t prevent you from recreating the record.

Think of the cycle:

create user (login=JOE)
soft-delete (set deleted column to non-null.)
(re) create user (login=JOE). ERROR. LOGIN=JOE is already taken

Second create results in a constraint violation because login=JOE is already in the soft-deleted row.

Some techniques:

Move the deleted record to a new table.
Make your uniqueness constraint across the login and deleted_at timestamp column

My own opinion is +1 for moving to new table. It takes lots of discipline to maintain the AND delete_at = NULL across all your queries (for all of your developers)

edited Jun 27 '23 at 01:55

Pang

9,564
146
81
122

answered May 04 '11 at 14:16

Andy Rappaport

101
1
2

3

Hi Andy, I met exactly your circumstance. But I don't think that moving to another table is good for long maintenance. Option 2 seems to be better.. Why don't you recommend it? I'm just curious. – Hoàng Long Sep 26 '12 at 04:56
1

There's another approach where for deleting a row, you copy the unique key into a soft-deletion column (of the same type) and make the original null. – mrmashal Jan 21 '18 at 08:18
@HoàngLong I don't think Option 2 is a solution, as it wouldn't be possible to delete the second JOE. – Nicholas Pipitone Jul 17 '18 at 20:19
1

@NicholasPipitone why not? Note that we made the constraint on (login, delete_at_timestamp), NOT (login, is_deleted) – Hoàng Long Jul 18 '18 at 02:05
2

@HoàngLong I just tested it, and it doesn't work - but for an interesting reason. Apparently MySQL doesn't enforce unique constraints on NULL columns, only non-NULL columns. So Option 2 ends up not working. – Nicholas Pipitone Jul 20 '18 at 17:26
@mrmashal so you need a soft deletion column for each other column? sounds inefficient! – Menasheh Aug 19 '18 at 14:32
@Menasheh I think that, in most cases, there is no more than one alternate key column with a unique index, and we can declare its index to be unique only on non-null values. In this way, we can implement soft-deletion with the one extra column that holds the copy of the alternate key. We might instead be interested in a nullable string column, holding a JSON, etc containing any backup or additional information that we need to store somewhere, as the soft-deletion column. – mrmashal Aug 19 '18 at 18:11
@NicholasPipitone, not only MySQL: "Any attempt to insert a duplicate entry will result in an error. For the purposes of unique indices, all NULL values are considered different from all other NULL values and are thus unique. This is one of the two possible interpretations of the SQL-92 standard (the language in the standard is ambiguous) and is the interpretation followed by PostgreSQL, MySQL, Firebird, and Oracle. Informix and Microsoft SQL Server follow the other interpretation of the standard." From https://sqlite.org/lang_createindex.html – selurvedu Aug 23 '19 at 21:22
@NicholasPipitone you can use UNIX timestamp for is_deleted column, which 0 means not deleted, this way there will be no null rows. ref: https://medium.com/@BBreyten/using-unique-fields-and-soft-deletes-fe37e7c47ce3 – cece2048 Jan 06 '21 at 09:33

Brent · Answer 8 · 2008-09-16T01:33:11.513

5

You will definitely have better performance if you move your deleted data to another table like Jim said, as well as having record of when it was deleted, why, and by whom.

Adding where deleted=0 to all your queries will slow them down significantly, and hinder the usage of any of indexes you may have on the table. Avoid having "flags" in your tables whenever possible.

edited Sep 16 '08 at 01:33

answered Sep 16 '08 at 00:57

Brent

23,354
10
44
49

score 2 · Answer 9 · answered Sep 16 '08 at 14:15

2

you don't mention what product, but SQL Server 2008 and postgresql (and others i'm sure) allow you to create filtered indexes, so you could create a covering index where is_deleted=0, mitigating some of the negatives of this particular approach.

answered Sep 16 '08 at 14:15

Andy Irving

2,657
1
14
11

score 1 · Answer 10 · edited Sep 08 '19 at 22:32

1

Use a view, function, or procedure that checks is_deleted = 0; i.e. don't select directly on the table in case the table needs to change later for other reasons.

And index the is_deleted column for larger tables.

Since you already have an audit trail, tracking the deletion date is redundant.

edited Sep 08 '19 at 22:32

MarredCheese

17,541
8
92
91

answered Sep 16 '08 at 03:25

Steven A. Lowe

60,273
18
132
202

score 1 · Answer 11 · answered Sep 16 '08 at 05:31

Something that I use on projects is a statusInd tinyint not null default 0 column using statusInd as a bitmask allows me to perform data management (delete, archive, replicate, restore, etc.). Using this in views I can then do the data distribution, publishing, etc for the consuming applications. If performance is a concern regarding views, use small fact tables to support this information, dropping the fact, drops the relation and allows for scalled deletes.

Scales well and is data centric keeping the data footprint pretty small - key for 350gb+ dbs with realtime concerns. Using alternatives, tables, triggers has some overhead that depending on the need may or may not work for you.

SOX related Audits may require more than a field to help in your case, but this may help. Enjoy

score 0 · Answer 12 · answered Jun 27 '11 at 11:15

Create an other schema and grant it all on your data schema. Implment VPD on your new schema so that each and every query will have the predicate allowing selection of the non-deleted row only appended to it. http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/cmntopc.htm#CNCPT62345

score 0 · Answer 13 · answered Sep 16 '08 at 01:09

0

I prefer to keep a status column, so I can use it for several different configs, i.e. published, private, deleted, needsAproval...

answered Sep 16 '08 at 01:09

UnkwnTech

88,102
65
184
229

score -1 · Answer 14 · edited Sep 08 '19 at 22:08

-1

@AdditionalCriteria("this.status <> 'deleted'")

put this on top of your @entity

http://wiki.eclipse.org/EclipseLink/Examples/JPA/SoftDelete

edited Sep 08 '19 at 22:08

MarredCheese

17,541
8
92
91

answered May 21 '14 at 14:26

Kalpesh Soni

6,879
2
56
59

What is the best way to implement soft deletion?

14 Answers14

Linked

Related