1

I am trying to feed some netflow data into kafka. I have some netflow.pcap files which I read like

tcpdump -r netflow.pcap and get such an output:

14:48:40.823468 IP abts-kk-static-242.4.166.122.airtelbroadband.in.35467 > abts-kk-static-126.96.166.122.airtelbroadband.in.9500: UDP, length 1416
14:48:40.824216 IP abts-kk-static-242.4.166.122.airtelbroadband.in.35467 > abts-kk-static-126.96.166.122.airtelbroadband.in.9500: UDP, length 1416

. . . .

In the official docs they mention the traditional way of starting a kafka producer, starting a kafka consumer and in the terminal input some data on producer which will be shown in the consumer. Good. Working.

Here they show how to input a file to kafka producer. Mind you, just one single file, not multiple files.

Question is:

How can I feed the output of a shell script into kakfa broker?

For example, the shell script is:

#!/bin/bash
FILES=/path/to/*
for f in $FILES
do
  tcpdump -r netflow.pcap
done

I can't find any documentation or article where they mention how to do this. Any idea? Thanks!

Community
  • 1
  • 1
HackCode
  • 1,837
  • 6
  • 35
  • 66
  • 1
    If you're the one generating `pcap` files, from now on, you could pipe `pcap` output directly to the console producer, instead of first saving to files. Then you wouldn't need to worry about the data volume. – Marko Bonaci Jan 26 '16 at 22:33

1 Answers1

2

Well, based on the link you gave on how to use the shell kafka producer with an input file, you can do the same with your output. You can redirect the output to a file and then use the producer.

Pay attention that I used >> in order to append to the file and not to overwrite it.

For example:

#!/bin/bash
FILES=/path/to/*
for f in $FILES
do
  tcpdump -r netflow.pcap >> /tmp/tcpdump_output.txt
done

kafka-console-produce.sh --broker-list localhost:9092 --topic my_topic
--new-producer < /tmp/tcpdump_output.txt
Avihoo Mamka
  • 4,656
  • 3
  • 31
  • 44
  • Actually, imagine each `netflow.pcap` files are 1GB and there are MANY such files in the directory. The solution you gave would probably not be efficient enough. Don't you think? – HackCode Jan 26 '16 at 09:47
  • Well, you can output each tcpdump to different file and then iterate the output files and produce them to kafka. Else, you can install [logstash](https://www.elastic.co/products/logstash) on your machine and configure it to read input from some folder, let's say your output folder where all the tcpdumps files are, and install kafka plugin to logstash and use it to output the content to kafka. – Avihoo Mamka Jan 26 '16 at 09:50
  • could you please elaborate on that comment? maybe some preliminary steps? – HackCode Jan 28 '16 at 15:01
  • 1
    You need to use `logstash` with input plugin for [file](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html) and output plugin for [kafka](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-kafka.html) – Avihoo Mamka Jan 28 '16 at 15:09