For me it's seems like spring kafka's suffix mechanism would avoid the proper use of a transactional.id
in a kafka application.
As far as i know the transactional.id
has quite some requirements for proper usage by kafka. Which are hard to explain (especially for all cases), so i will concentrate on the case where a "read / process / write" in an "exactly once semantic" way is used.
I think i should explain this case as an example in short here, so we are on the same page; also it's quite complex and maybe i have a flaw in my understanding.
In general some process is reading from some partition P0 of topic T0 the payload M0. Then it processes some data and creates a result F(M0) and writes it to another topic T1.
With transaction it would work like. Registering a transactional.id
to the transaction coordinator.
- Start a transaction in the context of the registered
transactional.id
. - Do processing M0 -> f(M0).
- Send f(M0) to T1 (within the transaction)
- Commit offset of partition P0 and topic T0 for message M0 (uses same producer within same transaction)
- Commit the transaction
If the producer dies in a not graceful way, it could have produced some commit which has the state open transaction
with the transaction id of this producer.
If a new producer comes up with the same transactional.id
it would be able to take over and manage the open transactions (either by succeed or abort).
But if spring kafka would add a running number as suffix per created producer (for each template call there will be a new created one (if not taken from cache)). Then a restarted application could have a different transactional.id
even if it's the same application and is using the same input topic partition.
Like original Transaction used was T0.P0 10
(where T0.P0
is the given prefix and 10
was the running number postfix). And the started application uses T0.P0 1
.
Do i miss something here? What's the purpose of this suffix?
(I put this as a question here because i'm not sure if this is really a bug in spring kafka (and i know they prefer to have such discussions on stackoverflow and not (yet) as a ticket)
sources:
- How to pick a Kafka transaction.id
- https://www.confluent.io/de-de/blog/transactions-apache-kafka/ -> search for "How to pick a transactional.id"