Avoid consuming same events parallel from EventHub

Question

I'm using:

Azure platform to run some microservice architecture software solution.
microservices are using the Azure-EventHub for communicating in special cases.
Kubernetes with 2 clusters (primary, secondary)
per application namespace, there is 1 event-listener pod running per cluster for consuming from eventhub

The last point is relevant to my current problem: The load balancers will share traffic between the primary and secondary clusters. This means that 2 event-listener-pods are running per application at the same time. So they are just reacting to events but some times they are consuming the same event from the event hub and this causes some duplicated notification mails.

So finally my question is: How can I avoid reading the same event twice the same time? I thought event hub index is always increasing but starting at the same moment is not "secured".

score 1 · Accepted Answer · answered Jul 08 '20 at 08:24

1

You will need to use separate consumer groups per pod to avoid EPOCH error.

That said, both pods will read the same events, so you have two options.

Have an active-passive set up. One consumer group, one pod that reads the events and delegates the work out on each event. If that pod fails, then a health/heart beat mechanism brings the second pod online.
Have an active-active set up. Two consumer groups, two active pods. You will need to implement idempotent processing.

Idempotent processing, where processing the same message multiple times produces the same result, is good practice regardless of approach. This would allow you to replay batches of events in which one errored and not have adverse affects on the integrity of your data.

I would opt for the first option, a single event hub reader will process thousands of events per second and pass off the work to your micro services.

If you have lower volumes of messages and need guaranteed message processing, then using Service Bus may be a better choice where messages can be locked, completed and abandoned.

answered Jul 08 '20 at 08:24

Murray Foxcroft

12,785
7
58
86

1

Sounds all logic to me -thanks. I think I need to go for the 2nd approach and see how much services I need to adjust. Maybe I can build hashes of events+payload and store them so to see what was consumed already or not. Do you think this will work? – LenglBoy Jul 08 '20 at 08:57
You will have the partition id and sequence number which can be used for uniqueness across the hub. It is good practice to design in a correlation id across your messages to make this a little more intuitive and to be "sticky" across producers, consumers and hubs, https://learn.microsoft.com/en-us/dotnet/api/azure.messaging.eventhubs.eventdata?view=azure-dotnet – Murray Foxcroft Jul 08 '20 at 09:09
Remember that you will only need to store the identifier impotence for a max of 7 days (maximum event hub retention), so that could help with volumes. – Murray Foxcroft Jul 08 '20 at 09:10
I already know those 7 days but maybe some others are lucky about this mention. So thanks a lot for the fast answer. – LenglBoy Jul 08 '20 at 09:15

Avoid consuming same events parallel from EventHub

1 Answers1