My Kafka producers are distributing the messages into topic partitions based on a given key.
So, in the Spark side I already have the messages that need be processed together in the same partition.
Now, I need to do a groupByKey to have in each partition the values aggregated in a list by the keys, but not need merge the partitions because there is not chance to have a given key in more than one partition.
How could I do this groupByKey only at partition level ?
|topic-partition1| ---> |spark-partition1| -- groupByKey --> |spark-partition1.1| -- mapGroupsWithState --> ...
|topic-partition2| ---> |spark-partition2| -- groupByKey --> |spark-partition2.1| -- mapGroupsWithState --> ...
|topic-partition3| ---> |spark-partition3| -- groupByKey --> |spark-partition3.1| -- mapGroupsWithState --> ...