5

Our application creates/updates database entries based on an external service's webhooks. The webhook sends the external id of the object so that we can fetch more data for processing. The processing of a webhook with roundtrips to get more data is 400-1200ms.

Sometimes, multiple hooks for the same object ID are sent within microseconds of each other. Here are timestamps of the most recent occurrence:

2020-11-21 12:42:45.812317+00:00
2020-11-21 20:03:36.881120+00:00 <-
2020-11-21 20:03:36.881119+00:00 <-

There can also be other objects sent for processing around this time as well. The issue is that concurrent processing of the two hooks highlighted above will create two new database entries for the same single object.

Q: What would be the best way to prevent concurrent processing of the two highlighted entries?

What I've Tried: Currently, at the start of an incoming hook, I create a database entry in a Changes table which stores the object ID. Right before processing, the Changes table is checked for entries that were created for this ID within the last 10 seconds; if one is found, it quits to let the other process do the work.

In the case above, there were two database entries created, and because they were SO close in time, they both hit the detection spot at the same time, found each other, and quit, resulting in nothing being done.

I've thought of adding some jitter'd timeout before the check (increases processing time), or locking the table (again, increases processing time), but it all feels like I'm fighting the wrong battle.

Any suggestions?

Our API is Django 3.1 with a Postgres db

sean.hudson
  • 515
  • 1
  • 5
  • 20
  • Here's a good summary of the underlying issue: https://postmarkapp.com/blog/why-idempotency-is-important The processing code is not idempotent - however because it's based on the object ID sent, I'm not sure I can make it idempotent. It's a chicken v egg scenario. I'm going to head in the direction of locking the Change table while the ID is saved so IDs cannot be concurrently saved. – sean.hudson Nov 21 '20 at 21:44
  • Can you show us the docs for this webhook? Surely the API provides some kind of *transaction id* – Lord Elrond Nov 21 '20 at 22:12
  • Yeah they don't include a transaction ID: https://developers.acuityscheduling.com/docs/webhooks – sean.hudson Nov 22 '20 at 21:31

2 Answers2

3

Okay, this might not be a very satisfactory answer, but it sounds to me like the root of your problem isn't necessarily with your own app, but the webhooks service you are receiving from.

Due to inherent possibility for error in network communication, webhooks which guarantee delivery always use at-least-once semantics. A sender that encounters a failure that leaves receipt uncertain needs to try sending the webhook again, even if the webhook may have been received the first time, thus opening the possibility for a duplicate event.

By extension, all webhook sending services should offer some way of deduplicating an individual event. I help run our webhooks at Stripe, and if you're using those, every webhook sent will come with an event ID like evt_1CiPtv2eZvKYlo2CcUZsDcO6, which a receiver can use for deduplication.

So the right answer for your problem is to ask your sender for some kind of deduplication/idempotency key, because without one, their API is incomplete.

Once you have that, everything gets really easy: you'd create a unique index on that key in the database, and then use upsert to guarantee only a single entry. That would look something like:

CREATE UNIQUE INDEX index_my_table_idempotency_key ON my_table (idempotency_key);

INSERT INTO object_changes (idempotency_key, ...) VALUES ('received-key', ...)
    ON CONFLICT (idempotency_key) DO NOTHING;

Second best

Absent an idempotency ID for deduping, all your solutions are going to be hacky, but you could still get something workable together. What you've already suggested of trying to round off the receipt time should mostly work, although it'll still have the possibility of losing two events that were different, but generated close together in time.

Alternatively, you could also try using the entire payload of a received webhook, or better yet, a hash of it, as an idempotency ID:

CREATE UNIQUE INDEX index_my_table_payload_hash ON my_table (payload_hash);

INSERT INTO object_changes (payload_hash, ...) VALUES ('<hash_of_webhook_payload>', ...)
    ON CONFLICT (payload_hash) DO NOTHING;

This should keep the field relatively small in the database, while still maintaining accurate deduplication, even for unique events sent close together.

You could also do a combination of the two: a rounded timestamp plus a hashed payload, just in case you were to receive a webhook with an identical payload somewhere down the line. The only thing this wouldn't protect against is two different events sending identical payloads close together in time, which should be a very unlikely case.

Brandur
  • 91
  • 2
  • 1
    Yes, you've hit the nail on the head with the core issue, so I'm trying to build guardrails around an incomplete webhook event system. Thanks for all your work at Stripe! I love working with the Stripe API. I'll look into doing a hybrid version. As you've mentioned, the problem is if 2 change events are sufficiently close, I will sync our DB with the slightly older version. – sean.hudson Nov 22 '20 at 21:18
1

If you look at the acquity webhook docs, they supply a field called action, which key to making your webhook idempotent. Here are the quotes I could salvage:

action either scheduled rescheduled canceled changed or order.completed depending on the action that initiated the webhook call

The different actions:

  • scheduled is called once when an appointment is initially booked
  • rescheduled is called when the appointment is rescheduled to a new time
  • canceled is called whenever an appointment is canceled
  • changed is called when the appointment is changed in any way. This includes when it is initially scheduled, rescheduled, or canceled, as well as when appointment details such as e-mail address or intake forms are updated.
  • order.completed is called when an order is completed

Based on the wording, I assume that scheduled, canceled, and order.completed are all unique per object_id, which means you can use a unique together constraint for those messages:

class AcquityAction(models.Model):
    id = models.CharField(max_length=17, primary_key=True)

class AcquityTransaction(models.Model):
    action = models.ForeignKey(AcquityAction, on_delete=models.PROTECT)
    object_id = models.IntegerField()

    class Meta:
        unique_together = [['object_id', 'action_id']]

You can substitute the AcquityAction model for an Enumeration Field if you'd like, but I prefer having them in the DB.

I would ignore the change event entirely, since it appears to trigger on every event, according to their vague definition. For the rescheduled event, I would create a model that allows you to use a unique constraint on the new date, so something like this:

class Reschedule(models.Model):
    schedule = models.ForeignKey(MyScheduleModel, on_delete=models.CASCADE)
    schedule_date = models.DateTimeField()

    class Meta:
        unique_together = [['schedule', 'schedule_date']]

Alternatively, you could have a task specifically for updating your schedule model with a rescheduled date, that way it remains idempotent.

Now in your view, you will do something like this:

from django.db import IntegrityError

ACQUITY_ACTIONS = {'scheduled', 'canceled', 'order.completed'}

def webhook_view(request):
    validate(request)
    action = get_action(request)
    
    if action in ACQUITY_ACTIONS:
        try:
            insert_transaction()
        except IntegrityError:
            return HttpResponse(200)

        webhook_task.delay()

    elif action == 'rescheduled':
        other_webhook_task.delay()

    ...
Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
  • My apologies, but while I'm using the Postmark service, the specific issue is not with this service. The service in question does not use event style ids, only pings our API with the message "id 12345 has changed." If 12345 is modified multiple times, we get the same "id 12345 has changed" message, no distinguishing between the events. – sean.hudson Nov 21 '20 at 22:59
  • This does give me an idea though; I could create my own message key based on the datetime stamp, round it to the nearest 10 seconds, and append the id. So the MyTransaction id would be something like "2020-11-21-20:03:30-12345" as an example. This would ensure no duplicate webhook could be processed in a 10 second interval. – sean.hudson Nov 21 '20 at 23:03
  • 1
    @sean.hudson That would work, but I would recommend keeping the current time as a date column, and using some kind of check constraint using the `object_id` and current time. I've never used this before, but [gist](https://stackoverflow.com/questions/19504727/preventing-adjacent-overlapping-entries-with-exclude-in-postgresql/19505869#19505869) looks like a good starting point – Lord Elrond Nov 21 '20 at 23:16
  • This is a great idea! There are still issues around webhooks being sent "at least once," but you can navigate them much better. At this point, the occurrence of the issue isn't enough to justify a rewrite of the current implementation - I'm just using "changed" to make inserts/updates as they come. It's crude and likely results in overhead, but it works without causing headaches. Thanks for the detailed answer! I've gone with {POSIX timestamp}-{id}, slightly rounded off, and it's performing well enough for now. – sean.hudson Nov 23 '20 at 04:50
  • For the help of generating my final answer, I've selected your answer – sean.hudson Nov 23 '20 at 04:55