5

in my ambari cluster ( version 2.6 )

we have master machines and workers machines while kafka installed on the master machines

the partition /data is only 15G and kafka log folder is - /data/var/kafka/kafka-logs

most of the folders under /data/var/kafka/kafka-logs are with size 4K-40K

but two folders are very huge size - 5G-7G , and this cause /data to be 100%

example:

under /data/var/kafka/kafka-logs/mmno.aso.prpl.proces-90

12K     00000000000000000000.index
1.0G    00000000000000000000.log
16K     00000000000000000000.timeindex
12K     00000000000001419960.index
1.0G    00000000000001419960.log
16K     00000000000001419960.timeindex
12K     00000000000002840641.index
1.0G    00000000000002840641.log
16K     00000000000002840641.timeindex
12K     00000000000004260866.index
1.0G    00000000000004260866.log
16K     00000000000004260866.timeindex
12K     00000000000005681785.index
1.0G    00000000000005681785.log

is it possible to limit the size of the logs? or other solution ? we have small /data and need logs should not be with 1G size , how to solve it?

King David
  • 500
  • 1
  • 7
  • 20

1 Answers1

4

Kafka has a number of broker/topic configurations for limiting the size of logs. In particular:

  • log.retention.bytes: The maximum size of the log before deleting it
  • log.retention.hours: The number of hours to keep a log file before deleting it

Note that these are not hard bounds as deletion happens per segment as described in: http://kafka.apache.org/documentation/#impl_deletes. Also these are per topic. But by setting these you should be able to control the size of your data directory.

See http://kafka.apache.org/documentation/#brokerconfigs for the full list of log.retention.*/log.roll.*/log.segment.* configs

Mickael Maison
  • 25,067
  • 7
  • 71
  • 68
  • I think log.retention.hours will not help because some topic folder under kafka-logs are 5-10G so in case we minimize it to 10 HOURS , then folder with huge size will already appears for MAX 10H – King David Oct 03 '17 at 10:43
  • Then use `log.retention.bytes`. I mentioned all the possible configurations you can use to control the logs size, then it's up to you to find the combination that matches your cluster requirements. – Mickael Maison Oct 03 '17 at 12:44