1

So I have a situation with a microservice architecture where I need to guarantee that incoming messages that have common identifier would be processed in order they come from kafka:

      message2, message1 kafka
     ------------------------------
             |message1       |message2
             |               |
         Instace1         Instance2

In the example below, I have two instances of a service that are processing messages from kafka, but I want to ensure that message2 is only processed after message1.

Apparently, this situation is easily solved by configuring one instance to consume only from a particular partition which would store messages with the common indetifier:

message2, message1 kafka
--------------------------------
       | message2
       | message1
     Instance1        Instance2

Now the order is guaranteed, and message2 will never be processed before message1.

However, I was wondering if this issue could be solved another way, directly in code instead of relying on infrastructure? This looks like it could be a standard problem in microservice architecture but I'm not sure what would be the preferred approach to solve it ?

Zed
  • 5,683
  • 11
  • 49
  • 81
  • May be an option : use the 'key' while sending the message. That will ensure that the messages with the same key will end up in same partition. : https://stackoverflow.com/questions/29511521/is-key-required-as-part-of-sending-messages-to-kafka – Abbin Varghese Jun 26 '19 at 14:00

3 Answers3

1

Kafka only guarantees ordering within a partition.

So if you want "message1" to be processed before "message2", you need to ensure both messages end up on the same partition. Then any consumer reading these messages is guaranteed to see them in the order they were produced.

Mickael Maison
  • 25,067
  • 7
  • 71
  • 68
  • Yes, I'm aware of that, and yes, it solves my problem. However, I was wondering would it make sense to actually not use infrastructure but rely on code to solve this kind of problem? I'm not sure really how would I even tackle the issue that way – Zed Jun 25 '19 at 15:18
  • I'd recommend relying on Kafka for a strict ordering guarantee. Not sure what your use case is or what sort of guarantees you need but consuming messages from multiple partitions in order is not trivial if you want to cover scenarios with many clients – Mickael Maison Jun 25 '19 at 18:24
1

I'd suggest infrastructure as the more "correct" way to go, but solving this with code should be possible:

If you have a single producer of messages, attach to the message the identifier of the directly-preceding message and before consuming the message make sure you consumed the directly-preceding one before.

If you have multiple producers, this gets a bit more tricky, as you'd have to synchronise the identifiers.

Again, I suggest the infrastructure to be the more "correct" way of solving this (the less code you write, and the less complex, the less bugs you'll have).

orirab
  • 2,915
  • 1
  • 24
  • 48
  • 1
    Interesting solution but lets say there's just a single producer: 1. why I would need identifiers of all previous messages to check, wouldn't be enough just to check the identifier of the last message? If I would really need all identifiers, doesn't that mean that the message could eventually end up with hundreds of thousands of identifiers? 2. How exactly do you mean check before consuming the message? I thought that the process of consumption is automatic in Spring, not sure how I can check if the message satisfies the criteria before consuming it from kafka? Thanks. – Zed Jun 25 '19 at 17:43
  • 1. You are, if course, correct - you only need the last one. – orirab Jun 25 '19 at 17:58
  • 2. As far as I know, you'd have to implement this yourself - keep the last identifier you handled in-memory/db, and for each new message - if it is directly following, then handle it as expected. Otherwise - retry handling it in X secs (hopefully the 'directly-following' message will be consumed by then). – orirab Jun 25 '19 at 18:02
  • If this is a fitting answer, would you consider accepting it? – orirab Jun 26 '19 at 21:03
0

You can disable the auto-commit feature and commit the offset of the message you have used manually. Take a look at this link to see how to configure it. Then, by having a variable that holds the last used index of the message you can do what you want but you must be sure that one single instance of code has access to this variable at a time. You can use another microservice to store/protect this value using something like a semaphore.

So, each consumer waits until all of the messages previous to the current message will be consumed and then starts to consume the message to save the order of messages.

But this solution adds more complexity to the code and also what is the benefit of using more than one consumer in this case? In the best case, there is no difference, in case of performance, between using 1 consumer or 10 consumers if you want to save the order or messages because consumers must wait until previous messages arravial.