I want to described the following case that was on one of our production cluster
We have ambari cluster with HDP version 2.6.4
Cluster include 3 kafka machines – while each kafka have disk with 5 T
What we saw is that all kafka disks was with 100% size , so kafka disk was full and this is the reason that all kafka brokers was failed
df -h /kafka
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 5T 5T 23M 100% /var/kafka
After investigation we saw that log.retention.hours=7 days
So seems that purging is after 7 days and maybe this is the reason that kafka disks are full with 100% even if they are huge – 5T
What we want to do now – is how to avoid this case in the future?
So
We want to know – how to avoid full used capacity on kafka disks
What we need to set in Kafka config in order to purge the kafka disk according to the disk size – is it possible ?
And how to know the right value of log.retention.hours
? according to the disk size or other?