5

I wrote the Kafka Streaming application and I want to deploy it on Kafka cluster. So I built a jar file and run it using the command:

 java -jar KafkaProcessing-1.0-SNAPSHOT-jar-with-dependencies.jar testTopic kafka1:9092,kafka2:9092 zookeeper1:2181,zookeeper2:2181 output

It runs correctly but the job is running on the machine I run above command! I thought when I specify BOOTSTRAP-SERVERS it automatically does computing on cluster, not on the host machine!

So my question is how can I submit Kafka streaming job on kafka cluster? Like Spark and Flink that provided commands spark-submit and flink run to deploy applications on the cluster.

Soheil Pourbafrani
  • 3,249
  • 3
  • 32
  • 69

2 Answers2

5

Kafka streams has different architecture - it doesn't need cluster orchestration like Spark/Flink - they are just normal applications that you can start and stop, and if you start - they will be scaled up, if you stop they scale down. Internally they are using Kafka to coordinate data processing, similarly to other Kafka consumers.

If you have Kubernetes, Docker Swarm, or other similar platform, then you can pack your app into Docker, and use that platform to run your Kafka Streams app.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • So Kafka Streaming is not a parallel processing engine? – Soheil Pourbafrani Dec 02 '17 at 07:51
  • It allows processing data in parallel, but it different... you can increase number of threads inside instance, but they will be executing the same code. See https://stackoverflow.com/questions/39985048/kafka-streaming-concurrency for more detailed explanation – Alex Ott Dec 02 '17 at 07:58
  • 2
    See https://www.confluent.io/blog/elastic-scaling-in-kafka-streams/ for some details on elastic scaling. In short: just run multiple instances of your application. Need 5x processing power? Run 5 instances. Need 10x? Run 10 instances. And so on. One advantage of Kafka Streams over Spark, Flink, and Storm is that you can change parallelism during live operations (no downtime) -- you can add/remove instances while your application is running to add/remove processing capacity. – miguno Dec 04 '17 at 08:21
3

At my organization we are using the kafka streams application. We had explored this option of deploying on the server. This facility is simply not provided. You only have the option of running kafka streams application wherever you are running it. There exists no job submission option yet.

Anony-mouse
  • 2,041
  • 2
  • 11
  • 23
  • So how we can set parallelism level. Is it designed to run in parallel? – Soheil Pourbafrani Dec 01 '17 at 20:58
  • As of now, we have different machines running parallelly and running the streams. You can also probably go down that way. – Anony-mouse Dec 02 '17 at 06:59
  • You mean other stream processing engines like Flink and Storm? – Soheil Pourbafrani Dec 02 '17 at 07:17
  • 2
    Parallelism level: Kafka Streams works differently (and easier) than processing frameworks like Storm or Flink that require you to run a Storm or Flink processing cluster. With the Kafka Streams library, you build normal Java/Scala/... applications. Even so, your applications will be elastic, scalable, distributed, fault-tolerant, etc. See https://www.confluent.io/blog/elastic-scaling-in-kafka-streams/ for some more details on e.g. elastic scaling. In short: just run multiple instances of your application. Need 5x processing power? Run 5 instances. Need 10x? Run 10 instances. And so on. – miguno Dec 04 '17 at 08:20