10

I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch. I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module.

I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature exists for Filebeat.

Adrian
  • 5,603
  • 8
  • 53
  • 85
  • 1
    I don't understand why Filebeat is used here. It reads files, not TCP messages from Kafka. You don't need beats, just Logstash – OneCricketeer Sep 09 '17 at 03:16

1 Answers1

8

Kafka Connect can handle streaming data and is a bit more flexible. If you are just going to elastic, Filebeat is a clean integration for log sources. However, if you are going from Kafka to a number of different sinks, Kafka Connect is probably what you want. I'd recommend checking out the connector hub to see some examples of open source connectors at your disposal currently http://www.confluent.io/product/connectors/

dawsaw
  • 2,283
  • 13
  • 10
  • 2
    The thing is I agree with you but I don't have any evidence why one way is better than the other. Would you mind expanding your answer a bit? – Adrian Sep 13 '16 at 00:01
  • 1
    Full disclosure, I'm coming at this from the kafka perspective. I think Kafka Connect is generally more flexible and pluggable for dealing with Kafka data going to or from another data store. Filebeat specializes in moving data into elastic so it isn't general purpose by design. – dawsaw Sep 13 '16 at 23:42
  • 2
    are there any information about performance between these options? – imehl Oct 27 '16 at 14:40
  • 1
    Logstash is the flexible output component of the Elastic stack. https://www.elastic.co/guide/en/logstash/current/output-plugins.html – OneCricketeer Sep 09 '17 at 03:12
  • @dawsaw you answer is apt as filebeat is for log sources shipping only. In the case of Kafka log files (server*,state-change*,etc) , filebeat uses the Kafka module. – Abdurrahman Adebiyi Jan 19 '18 at 17:59