56

I can see a property in config/server.properties called log.dir? Does this mean kafka uses the same directory for storing logs and data both?

chicks
  • 2,393
  • 3
  • 24
  • 40
Midhun Mathew Sunny
  • 1,271
  • 4
  • 17
  • 30

4 Answers4

84

Kafka topics are "distributed and partitioned append only logs". Parameter log.dir defines where topics (ie, data) is stored.

It is not related to application/broker logging.

The default log.dir is /tmp/kafka-logs which you may want to change in case your OS has a /tmp directory cleaner.

Archimedes Trajano
  • 35,625
  • 19
  • 175
  • 265
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • 2
    If no log.dir is defined, then it stores the logs under /tmp/kafka-logs/-, at least on my CentOS 6 machine. – victtim Mar 14 '18 at 14:04
  • 1
    My understanding is the the Kafka data is stored in __*.log__ files and the location in __log.dir__ property. Using __log__ here is very confusing, surely it should be changed? – boardtc Sep 12 '18 at 15:34
  • I understand that it can be confusing. Feel free to bring it up at the mailing list. Anybody can suggest changes (the beauty of an Apache open source project). – Matthias J. Sax Sep 12 '18 at 17:04
  • The reason behind using.log is its append capability. Many real-time stream applications uses log to continuously append high volume of data. – Koushik Paul Mar 03 '19 at 22:32
  • 2
    It's worthy to note that the configuration file is config/server.properties holding log.dirs properrty. – tedyyu Jul 01 '20 at 08:35
19

log.dir or log.dirs in the config/server.properties specifiy the directories in which the log data is kept. The server log directory is kafka_base_dir/logs by default. You could modify it by specifying another directory for 'kafka.logs.dir' in log4j.properties.

amethystic
  • 6,821
  • 23
  • 25
  • as we see it is `/tmp/kafka-logs` in `apache-kafka` `v0.10.1.1`. Generally `/tmp` is avoided for such an important operation. Is there any rationale behind using `/tmp` or we can store it in some place like `/var/logs` as well. I am using `RHEL LVM` on `AWS ec2`. Posted @ https://serverfault.com/questions/923808/what-is-ideal-directory-for-kafka-messages#923852 as well. – Divs Jul 28 '18 at 11:17
  • would be really obliged if you _can_ please spare a couple of mins on these as well https://stackoverflow.com/questions/51557727/kafka-consumer-offsets-growing-in-size, https://stackoverflow.com/questions/51562804/kafka-old-consumer-offsets-are-not-getting-deleted .. – Divs Jul 28 '18 at 11:22
3

log.dir in server.properties is the place where the Kafka broker will store the commit logs containing your data. Typically this will your high speed mount disk for mission critical use-cases.

For application/broker logging you can use general log4j logging to get the event logging in your custom location. Below are the variables to do this.

-Dlog4j.configuration=file:<configuration file with log rolling, logging level etc.>  & -Dkafka.logs.dir=<path to logs>
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
amitsahu07
  • 39
  • 2
0

The directory location of logs and data were perfectly described by Mathias. Yet the data were designed for internal processing of Kafka engine, may you could use Kafka Connect to store and manipulate the data. Kafka Connect is a tool for scalability and reliability data between Apache Kafka and other systems. Look the picture bellow:

enter image description here

It will make simple to define connectors that move large amount of data into and out of Kafka internal data system. Kafka Connect can ingest entire database making the data available for stream processing or sink the specific data of a single topic (or multiples) to another system or database for further analysis.

Cassio Seffrin
  • 7,293
  • 1
  • 54
  • 54