0

I am new to Apache Kafka. I have a server [10.33.41.20] in which I recieve log file [in /tmp/LsCrak.log] which gets updated every second. I have installed Kafka 2.4.0 in another vm, server2 [10.33.41.22] . I am supposed to use the kafka console producer as shown below. But How can I get the remote file so that I do not lose any data ?

kafka-console-producer.sh  --broker-list  kftest1:9092,kftest2:9092,kftest3:9092 --topic kafka-LsCrak-topic &
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
stacktesting
  • 103
  • 8
  • Please don't replace your question to invalidate existing answers. If you'd like to ask about UDP data instead of log files, create a new post. If the below has answered your question, you may also accept it – OneCricketeer Apr 12 '22 at 14:22
  • Okay. Since it was captured in a file, thought to add it . Will create a separate one – stacktesting Apr 12 '22 at 16:02
  • If the below answer addresses the question that has been asked here, feel free to use the checkmark next to it to accept it – OneCricketeer Apr 13 '22 at 05:10

1 Answers1

0

recieve log file ... which gets updated every second. I am supposed to use the kafka console producer

Don't use this to read files; it doesn't track progress. In other words, running the console producer again for the same file will cause duplication of records.

Instead, use tools like Fluentd, Filebeat, Logstash, Kafka Connect Spooldir connector, etc. that actually support tailing files and outputting to Kafka.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Can I use the python kafka module and inturn read this ?. – stacktesting Apr 11 '22 at 18:12
  • Also, I am getting a series of data from a system. I have an option to call the kafka producer directly whenever a new data is inserted. I currently am just pushing that data to a file and thinking to read this file into Kafka . What is suggested – stacktesting Apr 11 '22 at 18:23
  • Sure, you can use Python instead of (or in addition to) writing to a file. Like I said, you shouldn't use the console-producer to read from any files that may change over time, unless you want to completely overwrite that file each time before reading it. – OneCricketeer Apr 11 '22 at 19:02
  • I am using the below code using python. But how can I alter this code to read from STDIN ? ``` from kafka import KafkaProducer producer = KafkaProducer(bootstrap_servers=['umbtest1:9092','umbtest2:9092','umbtest3:9092'], value_serializer=lambda x: dumps(x).encode('utf-8')) ``` – stacktesting Apr 12 '22 at 05:05
  • https://stackoverflow.com/questions/1450393/how-do-you-read-from-stdin – OneCricketeer Apr 12 '22 at 13:20
  • No. The output comes from udp port via stdin. – stacktesting Apr 12 '22 at 14:13
  • UDP data comes from the network, not stdin... Use socket module to accept udp data – OneCricketeer Apr 12 '22 at 14:14