0

For me it's seems like spring kafka's suffix mechanism would avoid the proper use of a transactional.id in a kafka application.

As far as i know the transactional.id has quite some requirements for proper usage by kafka. Which are hard to explain (especially for all cases), so i will concentrate on the case where a "read / process / write" in an "exactly once semantic" way is used.

I think i should explain this case as an example in short here, so we are on the same page; also it's quite complex and maybe i have a flaw in my understanding.

In general some process is reading from some partition P0 of topic T0 the payload M0. Then it processes some data and creates a result F(M0) and writes it to another topic T1.

With transaction it would work like. Registering a transactional.id to the transaction coordinator.

  1. Start a transaction in the context of the registered transactional.id.
  2. Do processing M0 -> f(M0).
  3. Send f(M0) to T1 (within the transaction)
  4. Commit offset of partition P0 and topic T0 for message M0 (uses same producer within same transaction)
  5. Commit the transaction

If the producer dies in a not graceful way, it could have produced some commit which has the state open transaction with the transaction id of this producer.

If a new producer comes up with the same transactional.id it would be able to take over and manage the open transactions (either by succeed or abort).

But if spring kafka would add a running number as suffix per created producer (for each template call there will be a new created one (if not taken from cache)). Then a restarted application could have a different transactional.id even if it's the same application and is using the same input topic partition.

Like original Transaction used was T0.P0 10 (where T0.P0 is the given prefix and 10 was the running number postfix). And the started application uses T0.P0 1.

Do i miss something here? What's the purpose of this suffix?

(I put this as a question here because i'm not sure if this is really a bug in spring kafka (and i know they prefer to have such discussions on stackoverflow and not (yet) as a ticket)

sources:

snap
  • 1,598
  • 1
  • 14
  • 21

1 Answers1

1

We have to suffix the transactional.id because you can only have one producer with that id. In a multi-threaded environment, we would have to single-thread all transactions, which would defeat the purpose.

...it would be able to take over and manage the open transactions (either by succeed or abort).

That is not correct; the failed transaction will always be aborted; and new producer with the same transactional.id would simply fence the old one.

The transaction will timeout if no new producer appears with the same id before that time.

EDIT

See KIP-447 - since EOS Mode V2 (previously BETA), producer fencing is now based on consumer metadata instead of just the transactional.id.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • but isn't it wrong (at least for the described scenario) if i have multiple producers for the same partition with different `transactional.id`'s because fencing will not work any more afaik --- and if i start the applcation and don't get the `transactional.id` I used before because last time it was on the 3 thread (and now i use less) then i get to the timeout scenario. timeouts are 7 days with default broker configuration. and should be avoided IMO – snap Aug 07 '23 at 14:16
  • No; the listener container gets a producer from the cache and starts the transaction; all send operations on that consumer thread use the same producer, and the container commits the transaction when the listener exits and the producer is returned to the cache. You will never have multiple producers for the same group/topic/partition. Sends can only participate in the transaction if they run on the consumer thread. Trasnsaction timeouts are 60 seconds by default. – Gary Russell Aug 07 '23 at 14:38
  • With EOSMode.V1 (previously ALPHA) we had to have a producer for each group/topic/partition; which could mean you had thousands of producers; V2 (BETA) was designed to avoid that. – Gary Russell Aug 07 '23 at 14:44
  • Ok but even if i cannot have multiple producers for the same `group/topic/partition` it could get a different suffix on different runs of the application (depending on the order producers are created). Regarding timeout i guess there are different concepts i was talking about the one from kafka for "messages in open transaction state": https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#transactional-id-expiration-ms. I will take a look regarding V1 & V2 for better understanding. but i still see for now an issue in the definition of the `transactional.id` – snap Aug 07 '23 at 14:56
  • Fencing is no longer only related to a producer with the same tx id. If a new producer sends offsets to a transaction using the same consumer group metadata, the broker fences the other producer and immediately aborts the partial transaction. https://kafka.apache.org/documentation/#producerconfigs_transaction.timeout.ms I just tested it; consume, produce, kill the app before commit; open transaction in the log. Start a new instance with a different tx.id prefix and the newly sent record is received without waiting the 60 seconds. See KIP-447. – Gary Russell Aug 07 '23 at 15:23
  • Another good resource: https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/ – Gary Russell Aug 07 '23 at 15:32
  • thx for all the information and for testing! seems like my information where out dated even if it was based on current documentation of kafka transactions. i will read the recommended blogpost and then come back here – snap Aug 07 '23 at 15:48
  • i'm now convinced that it's not an issue. if you add in your answer that since KIP-447 it's not longer a requirement to reflect the partition to producer binding in a static way inside of the transactional.id then i will directly accept it (i guess that was the key information i was missing). – snap Aug 10 '23 at 14:48
  • I have added a note to the answer. – Gary Russell Aug 10 '23 at 14:53