0

Suppose, I have one topic with 6 partitions and 2 consumers where P1, P2, P3 processed by C1 and P4, P5, P6 processed by C2. Let us say user data U1 always goes to P1, U2 to P2 and so on.

So,

C1 maintains state of users U1, U2, U3
C2 maintains state of users U4, U5, U6.

Now let us say we add one more consumer C3 so rebalancing happens and now

P1, P2, P3 -> C1
P4, P5 -> C2
P6 -> C3

So my application was maintaining user U6 state in C2 but now U6 data is flowing to C3

Now here somehow U6 state from C2 should flow to C3. So How this is achieved in Kafka knowing that its very common problem

OR

If Kafka does not provide any support, then how this problem is solved generally ... Is there any design pattern for solving it ?

1 Answers1

0

Kafka isn't going to do that for you -- you are going to need to develop your own logic for this. Yes, this is a common problem, but in some ways what you are tying to do goes against Kafka's design goals. For an eye-opening read, see some background on Kafka's design here.

Specifically, read the section "Don't Fear the Filesystem". You are making your problem more difficult by building (I assume complex) in-memory data structures to maintain state. Why not log that state to Kafka, and then a consumer can pick up right where a previous consumer left off?

Once your thinking becomes -- first I put the data in Kafka, then I use it in my application -- then it means all your consumers have access to the same data. There's no "private" in memory cache. And your problem is much simpler to solve.

David Griffin
  • 13,677
  • 5
  • 47
  • 65
  • Thanks! Seems like you are asking me go the way "offsets" are committed in Kafka through __consumer_offsets topic ? –  Apr 18 '16 at 13:06
  • Similar solution, yes. I do it with everything. Another approach actually is to not bother keeping state. Kafka is optimized for reading long runs of messages. Just have the (new) consumer build its own state by reading from the beginning. Worry about optimizing that later. – David Griffin Apr 18 '16 at 13:12
  • Another option would be to use `Zookeeper`, which was what Kafka used prior to `__consumer_offsets`. I asked a question related to this a while back, might be interesting to you: http://stackoverflow.com/questions/35869786/kafka-instead-of-zookeeper-for-cluster-management – David Griffin Apr 18 '16 at 16:41