You can theoretically run anything on YARN, even a simple hello world program. Which is why saying Kafka-Connect runs on YARN is technically correct. The caveat is that getting Kafka-Connect to run on YARN will take a fair amount of elbow grease at the moment. There are two ways to do it:
- Directly talk to the YARN API to acquire a container, deploy the Kafka-Connect binaries and launch Kafka-Connect.
- Use the separate Slider project https://slider.incubator.apache.org/docs/getting_started.html that Stephane has already mentioned in the comments.
Slider
You'll have to read quite a bit of documentation to get it working but the idea behind Slider is that you can get any program to run on YARN without dealing with the YARN API and writing a YARN app master by doing the following:
- Create a slider package out of your program
- Define a configuration for you package
- Use the slider cli to deploy your application onto YARN
Slider handles container deployment and recovery of failed containers for you, which is nice. Also Slider is becoming a native part of YARN when YARN 3.0 is released.
Alternatives
Also as a side note, getting Kafka-Connect to deploy on Kubernetes or Mesos / Marathon is probably going to be easier. The basic workflow to do that would be:
- Create a Kafka-Connect docker container or just use confluent's docker container
- Create a deployment config for Kubernetes or Marathon
- Click a button / run a command
Tutorials
- A good Mesos / Marathon tutorial can be found here
- Kubernetes tutorial here
- Confluent Kubernetes Helm Charts here