18

I'm a beginner with Apache NiFi, but until now All the tutorial that I read speak about the integration of kafka with Nifi. how it kafka is the complementary of Nifi? why we don't use Nifi directly to pusblish our message without the using of kafka?

Note: All tutorial that I seen does not speak about this point.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
BERGUIGA Mohamed Amine
  • 6,094
  • 3
  • 40
  • 38

3 Answers3

22

NiFi and Kafka complements in the sense that NiFi is not a messaging queue like Apache Kafka. On the contrary, Apache NiFi is a data-flow management aka data logistics tool.

Let's assume this scenario: You have messages (in JSON format) getting streamed through Kafka and you want to validate the messages to check if the message has all the fields and if they are valid, you want the messages to land in HBase.

Here NiFi can help you with the following approach:

  • NiFi has ConsumeKafka processors which you can configure with your Kafka broker and the group name.
  • Use the NiFi processor ValidateRecord to check if the received messages are all valid
  • If they are valid, you can connect the output to PutHBaseRecord

Summarizing, NiFi basically prevents you from writing a lot of boilerplate code. In this case, a custom logic to do schema validation and writing to HBase.

Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60
  • Just wonder if it's possible to implement nifi flow file storage in kafka. so that nifi queue will be a kafka queue. will it be efficient? – daggett Nov 29 '18 at 11:20
  • 5
    NiFi's queues and repositories are very specific to the needs of NiFi so I'm not sure that would be possible – Bryan Bende Nov 29 '18 at 14:03
  • @daggett You'd have to rewrite how Nifi stores the FlowFiles, and Kafka has a 1M default max message size, whereas a FlowFile can be larger than that, but assuming that is addressed, then implementing a Kafka serializer for them would be required as well – OneCricketeer Nov 29 '18 at 15:52
9

Found an interesting answer on Horthonworks community questions, I share it here for the sake of completeness:

  • Apache NiFi and Apache Kafka are two different tools with different use-cases that may slightly overlap. Here is my understanding of the purpose of the two projects.

    NiFi is "An easy to use, powerful, and reliable system to process and distribute data."

    It is a visual tool (with a REST api) that implements flow-based programming to enable the user to craft flows that will take data from a large variety of different sources, perform enrichment, routing, etc on the data as it's being processed, and output the result to a large variety of destinations. During this process, it captures metadata (provenance) on what has happened to each piece of data (FlowFile) as it made its way through the Flow for audit logging and troubleshooting purposes.

  • "Apache Kafka is publish-subscribe messaging rethought as a distributed commit log"

    It is a distributed implementation of the publish-subscribe pattern that allows developers to connect programs to each other in different languages and across a large number of machines. It is more of a building block for distributed computing than it is an all-in-one solution for processing data.

amiabl
  • 1,047
  • 19
  • 27
7

As an add-up to the previous answers, here is a precious resource explaining very clearly how to combine both technologies, and especially why to do so, with illustrated examples.

I found it very valuable and it is a goto reference whenever I need a memory refresher on this topic.

Kafka / NiFi : Better together

In short :

NiFi and Kafka Are Complementary

NiFi
• Provides dataflow solution
• Centralized management, from edge to core
• Great traceability, event level data provenance starting when data is born
• Interactive command and control
• Real time operational visibility
• Dataflow management, including prioritization, back pressure, and edge intelligence
• Visual representation of global dataflow

Kafka
• Provides durable stream store
• Low latency
• Distributed data durability
• Decentralized management of producers & consumers
• And much, much more...

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130