0

We're currently building an actor system with DDD Principles on top of Akka.NET.

We have several missing points in how to make our service resilient:

  • At-Most-Once-Delivery by default between Actors
  • Resilience of the actors mailboxes
  • FSMActors are stashing incoming messages, which couldn't be processed immediately - resilience?
  • Pub/Sub Pattern (and resilience)

We're not sure what to do if some messages are getting lost and therefore we can't transit to the next state to finalize a request, which is involving several actors.

My Idea was to use a event streaming system like kinesis for passing messages arround. We then have the resilience everywhere and just have to know which event in the stream we've processed. Am I missing something else? Do you think this is a goot idea? Is this violating some best practices?

Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
trialgod
  • 302
  • 1
  • 15

1 Answers1

0

At-most-once delivery is a deliberate choice when it comes to Akka.NET (and actually all popular distributed actor-model implementations out there). In the past original Akka from JVM had mailbox implementations over persistent queues, but this idea was dropped years ago as failed experiment.

  • Akka.NET is heavily oriented around message passing. For every user request, there may be dozens (or hundreds) of messages passed between actors in order to process it. With in-memory messaging this is fast and easy (Akka.NET can pass millions of msgs/sec. within single machine).
  • Usually what you're thinking of when it comes to reliable processing, is not reliable, persistent mailbox that matters. The clue is to reliably process a message - you can easily take off message from queue or log and a machine will break before you'll finish processing it.
  • At-least-once processing enforces you to either be able to recognize duplicates (which aside of redeliveries is not cost-effective for every single actor) or design all of your processing logic to work in idempotent way.

In some cases having acknowledgement is enough i.e. user sends a request and expects a response to arrive within some timeout. If reply didn't arrive i.e. because some messages were lost, just return a failure to the requester (and possibly ask him/her for retry).

Another common pattern is to use queue/log in specific places i.e. as the layer in front of your actor system. This way all of the user requests are first send to persistent queue, from which they are later picked up by actor system logic. Once actors inside will finish processing the request, they can commit it and remove from the queue. If some failure happened, whole process (or part of it) is simply retried.

Bartosz Sypytkowski
  • 7,463
  • 19
  • 36