tl;dr
No, a key is not required as part of sending messages to Kafka. But...
In addition to the very helpful accepted answer I would like to add a few more details
Partitioning
By default, Kafka uses the key of the message to select the partition of the topic it writes to. This is done in the DefaultPartitioner
by
kafka.common.utils.Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
If there is no key provided, then Kafka will partition the data in a round-robin fashion.
In Kafka, it is possible to create your own Partitioner by extending the Partitioner
class. For this, you need to override the partition
method which has the signature:
int partition(String topic,
Object key,
byte[] keyBytes,
Object value,
byte[] valueBytes,
Cluster cluster)
Usually, the key of a Kafka message is used to select the partition and the return value (of type int
) is the partition number. Without a key, you need to rely on the value which might be much more complex to process.
Ordering
As stated in the given answer, Kafka has guarantees on ordering of the messages only at partition level.
Let's say you want to store financial transactions for your customers in a Kafka topic with two partitions. The messages could look like (key:value)
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 2, "changeInBankAccount": +100}
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 1, "changeInBankAccount": -1337}
null:{"customerId": 1, "changeInBankAccount": +200}
As we do not have defined a key the two partitions will presumably look like
// partition 0
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 1, "changeInBankAccount": +200}
null:{"customerId": 1, "changeInBankAccount": +200}
// partition 1
null:{"customerId": 2, "changeInBankAccount": +100}
null:{"customerId": 1, "changeInBankAccount": -1337}
Your consumer reading that topic could end up telling you that the balance on the account is 600 at a particular time although that was never the case! Just because it was reading all messages in partition 0 in prior to the messages in partition 1.
With a senseful key (like customerId) this could be avoided as the partitoning would be like this:
// partition 0
1:{"customerId": 1, "changeInBankAccount": +200}
1:{"customerId": 1, "changeInBankAccount": +200}
1:{"customerId": 1, "changeInBankAccount": -1337}
1:{"customerId": 1, "changeInBankAccount": +200}
// partition 1
2:{"customerId": 2, "changeInBankAccount": +100}
Remember, that the ordering within a partition is only guaranteed with the producer configuration max.in.flight.requests.per.connection
set to 1
. The default value for that configuration is, however, 5
and it is described as:
"The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled)."
You can find more details on this in another Stackoverflow post on Kafka - Message Ordering Guarantees.
Log compaction
Without a key as part of your messages, you will not be able to set the topic configuration cleanup.policy
to compacted
. According to the documentation "log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.".
This nice and helpful setting will not be available without any key.
Usage of Keys
In real-life use cases, the key of a Kafka message can have a huge influence on your performance and clarity of your business logic.
A key can for example be used naturally for partitioning your data. As you can control your consumers to read from particular partitions this could serve as an efficient filter. Also, the key can include some meta data on the actual value of the message that helps you control the subsequent processing. Keys are usually smaller then values and it is therefore more convenient to parse a key instead of the whole value. At the same time, you can apply all serializations and schema registration as done with your value also with the key.
As a note, there is also the concept of Header that can be used to store information, see documentation.