We have a spring boot application build using Kafka streams which needs to integrate with an external application. So we have defined two topologies:
1. Request topology:
- A (
complex-type-key
,string-value
) record arrives in theinput-topic
. - From the
complex-type-key
we extract an uniquestring-key
and store thestring-key
and thestring-value
in a state store (local and persistent) - We transform the
string-value
in astring-request
and write the (string-key
,string-request
) record to the external application we are integrating with
2. Response topology:
- We receive the (
string-key
,string-response
) record from the external application in our response topic - We look up
string-key
in the state store and want to retrieve the originalstring-value
- Based on the original
string-value
andstring-request
we do some processing and forward the original (complex-type-key
,string-value
) tooutbound-topic-1
oroutbound-topic-2
All works fine when all topics have only one partition but at soon as we define our topics with multiple partitions the lookup in the state store does not find the string-value
for the given string-key
. We suspected that was because complex-type-key
and string-key
are different types and such will be processed by different stream processors. Having the sate store local would make not possible to retrieve the string-value
as in fact there would be one storage to save and a different one to look up. However we tried to convert the complex-type-key
int the string-key
and pass it through
an internal topic with a key of type String before saving to the state store.
This make me think our understanding about how all these state stores work is not clear and we are using it in a wrong way.
Thank you in advance for your inputs.
NOTE:
We solved the issue by remapping complex-type-key
into the string-key
and storing it together with the string-value
into an internal topic. Then when the response comes back we window leftJoin
it with this internal topic. This way we gave up the usage of the state store entirely. However I still believe our initial approach was correct, providing it worked fine with one partition. It would still be good to understand what we done wrong. My feeling is that window left join solution has the two disadvantages of consuming more storage (costly in AWS) as well as an inaccurate window length can lead to loosing responses which could mean support calls