Logical deletion with event sourcing (potentially with sensitive data / GDPR)

Question

I understand that event sources are supposed to be immutable and append only.

However, I'm wondering how I handle a logical delete. If the user clicks 'delete' on the UI and they are expecting a hard delete, do I include a IsDeleted flag on my event? Are there other options here?

Edit: The question has special interest when there is sensitive data around, maybe stored in the event itself, and the user expects it to be completely flashed-out from our systems. This can relate to the EU GDPR regulation and laws.

Vincent Hendriks · Answer 1 · 2018-06-26T15:21:17.893

3

You could publish a 'deleted' event which would remove/ mark the data as deleted in your read database, but this isn't a hard delete (which you specify in your question). You will still have the data in your event store.

Hard deletes are actually pretty difficult when using event sourcing. I assume you're working with event sourced customer data? There are usually a few solutions for this, but they aren't really pretty:

You either don't eventsource your sensitive customer data but store this seperately and just reference this from your aggregate in some way
You either delete old events (be aware that this might break more than you'd like, but it depends on your design / application)
You either add a deleted event and change existing events to strip out the sensitive data.

edited Jun 26 '18 at 15:21

answered Jun 26 '18 at 14:43

Vincent Hendriks

66
4

1

I think this is pretty important in the sense of "sensitive information" that was wntered in the system accidentally. Say for example that you store human-written CRM entries that a phone-operator registers during a call conversation with a customer. You would think in `{ "eventId": 123, "type": "annotation.added", "content": "The customer wants to know if the elephants excursion is still available." }`. But what if the operator writes something like `"The customer says that his password 'myUsualPasswordInAllSites000' does not work"`. [continues in next comment] – Xavi Montero Aug 22 '18 at 08:27
Double error: The customer should never have told it. But the operator should never have written that. But, take for sure, that this is an error that the customer would blame to our company if we store it and a malicious person accesses the data, enters there, sees a plain password and tries it on any other sensitive account of any other company (say, the bank account) of the customer, and it works. It's not that the user typed it. It was our operator. We need to "ensure" that sensitive info can be really erased some way or another. [continues in next comment] – Xavi Montero Aug 22 '18 at 08:30
The same happens for example if we import emails into our system from the IMAP. What if we import an email that contains sensitive data (say, accidentally a boyfriend wrote personal things to one of our representatives but he made it to the company email address instead of her personal email address). [continues in next comment] – Xavi Montero Aug 22 '18 at 08:34
1

I definitively advocate for the first of Vincent's proposals: to take the "sensitive data" into a separate "document store" that stores the data, calculates a hash on it (say a `sha1` for example) and uses that hash as an ID and then the event only points to the data by its id. In the limit you can remove the document but leave the event untouched. The event-replayer will simply rebuild some kind of "data deleted" in the projection when it processes the event and the document-data is not present. – Xavi Montero Aug 22 '18 at 08:36
Nevertheless I'd not make "document-deletion" a matter of a simple-click. That should only "append marks". This "real document deletion" should only be made by system administrators upon an audit that really tells we really have data there that we should never have stored. For the rest, just append "deletion events" and make your rebuilders process the projections adequately eliminating the row if there's no integrity violation, or leaving it tagged as "deleted" if you need references to it. – Xavi Montero Aug 22 '18 at 08:46
you can also store the sensitive data as a separate event stream if you really want the audit logging but also want to be able to delete the stream. just use the same identifier for both streams. you can always delete an entire stream, just not single events. – Vincent Hendriks Aug 22 '18 at 08:47
but that wouldnt help for comments like above. it depends per situation ofcourse :) – Vincent Hendriks Aug 22 '18 at 08:49
1

what you propose about marking it to be deleted is exactly what i proposed here for our current solution. you can then also leave the data until a certain point in time (gdpr) and then delete it permanently – Vincent Hendriks Aug 22 '18 at 08:52
Huummm... sounds interesing... yes, maybe events like "xxxx.sensitiveData.hardDeletionRequested" and "xxxx.sensitiveData.hardDeletionAuthorised" inside the event stream... It really sounds interesting, definitvely. Those events could carry info about "who" audits and who authoizes, if it's a human, or if it's a cron or robot, and so... interesting approach! – Xavi Montero Aug 22 '18 at 08:58

score 0 · Answer 2 · answered Jun 24 '18 at 11:30

do I include a IsDeleted flag on my event? Are there other options here?

If you are asking "Can I undo an event, by setting the isDeleted flag?"; no, that's not usually how we do it. Instead, we append a new event that "reverses" the effect of the first. You'll sometimes see this described as a compensating event. In mature domains (think accounting), there is often an explicit protocol for reversing events.

score 0 · Answer 3 · answered Jun 25 '18 at 09:27

Usually you don't display events in your UI, you display your read model that is calculated from events.

For instance, you can have events

TASK_CREATED "one"

TASK_CREATED "two"

TASK_CREATED "three"

TASK_DELETED "two"

In your read model (list of tasks) update code you just add items on TASK_CREATED event and remove it on TASK_DELETED event, so resulting list would be:

"one"

"three"

So, if user clicks 'delete' in the UI, it sends DELETE_TASK command to the aggregate, aggregate publishes TASK_DELETED event, and this event is applied to the read model (removing item from the list). Now when you query the read model, it will have one item removed.

score 0 · Answer 4 · answered Jun 28 '18 at 21:12

As @Vincent Hendriks said, "You could publish a 'deleted' event which would remove/ mark the data as deleted in your read database".

Here is a very good example that demonstrates this concept: http://next.belus.com/Demos/Events

In the demo, click the Edit link and press Delete button. At the bottom of the page, see the event that gets created.

Logical deletion with event sourcing (potentially with sensitive data / GDPR)

4 Answers4