I have three MSK clusters; dev, nonprod & prod. They all have the following cluster configuration - there is no topic level configuration.
auto.create.topics.enable=false
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
log.retention.hours=100
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
Dev and Nonprod are clearing down messages older than 100 hours as defined in the log.retention.hours=100
setting.
We have a lot more traffic coming through our production cluster and old messages are not being removed. We have hundreds of thousands of messages older than 400 hours still on the cluster. I have thought about adding further config settings such as
segment.bytes
segment.ms
To roll the segments quicker as maybe a segment hasn't rolled yet and can't be marked for deletion - however this same config is working nicely in the other clusters albeit not receiving as much traffic.