EPOCH Error while communicating with Azure Event Hub

Question

I am using Azure Event Hub for listening to real-time data in my application. Most of the times, it works fine but sometimes it throws the following error -

New receiver with higher epoch of '3109' is created hence current receiver with epoch '3108' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:eb2a6f970000494500f379f85b484a9f_C-1553490498_B22, SystemTracker:xxxxxxxxxx:eventhub:xxxxxxx~2730|$default, Timestamp:7/13/2018 6:48:54 AM.

and application stops processing data. I have gone through several articles on the net but not able to find any solution. On MSDN, I read

that EPH relies on the fact that "there can only be 1 active epoch receiver on a consumer group at any given time

but I am not sure how to ensure that there will be only 1 active epoch. Also, the same Event Hub is utilized by three different environments i.e. Development, Test, Production.

Any suggestions are highly appreciated..

Murray Foxcroft · Answer 1 · 2020-11-12T08:26:34.567

23

It sounds like you are running two instances of the application, two concurrent classes, or two applications that use the same event hub consumer group. Event hub consumer groups are effectively pointers to a point in time on the event stream. If you try and use one consumer group with two instances of code, then you get a conflict like the one you are seeing.

Either:

Ensure you only have a single instance reading the consumer group at a time.
Use two consumer groups when you need two separate programs or sets of functionality to process the event hub at the same time.
If you are looking to parallelize for performance, look in to event hub Partitioning and how to take advantage of processing each partition independently.

There is also an alternative scenario where an event hub partition is switched over to another host as part of the event hub's internal load balancing. In this case you may see the error you are receiving. In this case, just log it and continue on.

Here is some good documentation to help you on your way.

edited Nov 12 '20 at 08:26

answered Jul 20 '18 at 12:52

Murray Foxcroft

12,785
7
58
86

Is there a way to set up some policy so a replica which failed to cut it, starts all over and continues to get into the stream? Of course, without restart. – Dmytro Zhluktenko Mar 14 '19 at 17:48
1

Yes, using a checkpoint/offset, but you need to manage it in your code. I'd look for a more elegant design though. – Murray Foxcroft Mar 14 '19 at 17:50
Would you consider the following solution as more elegant? Two EH consumer replicas listening to the same consumer group which has 4 partitions. The first replica takes first two partitions, the second replica takes the rest. I assume, it's best to keep *only one* consumer replica on a partition, is it? – Dmytro Zhluktenko Mar 14 '19 at 18:09
You can only have one reader per consumer group. Trying to mange at the partition level will only create headaches. Have you considered service bus as an alternative? – Murray Foxcroft Mar 14 '19 at 19:40
I am reading the stream into a dataframe and then filtering that DF to get the invalid events. Now i want to iterate through the invalid events to log them - so i am doing a foreach over the invalidEvents DF. I am getting the same epoch error while doing so. Any workaround for this? So basically it is reading once - and writing once + writing a subset once more. – Preeti Joshi Jul 10 '19 at 10:59

EPOCH Error while communicating with Azure Event Hub

1 Answers1

Linked