I am exploring different options to package spark application and i am confused what is the best mode and what are the differences between the following modes?
- Submit spark application's jar to spark-submit
- Construct a fat jar out of spark gradle project and run the jar as stand alone java application.
I have tried both the ways , but my requirement is to package the spark application inside docker container , running fat jar looks easy for me but as am a newbie i don't have any idea about the restrictions that i may face if i go with fat jar approach(leaving aside fat jar may grow in size)
Can you please let us know your inputs
Is it possible to setup spark cluster including driver and executors programatically ?
val conf = new SparkConf()
conf.setMaster("local")
conf.set("deploy-mode", "client")
conf.set("spark.executor.instances", "2")
conf.set("spark.driver.bindAddress", "127.0.0.1")
conf.setAppName("local-spark-kafka-consumer")
val sparkSession = SparkSession
.builder()
.master("local[*]")
.config(conf)
.appName("Spark SQL data sources example")
.getOrCreate()
val sc = sparkSession.sparkContext
val ssc = new StreamingContext(sparkSession.sparkContext, Seconds(5))
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092,localhost:9093",
"key.deserializer" -> classOf[LongDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "consumerGroup10",
"auto.offset.reset" -> "earliest",
"max.poll.records" -> "1",
"enable.auto.commit" -> (false: java.lang.Boolean))
val topics = Array("topic1")
val stream = KafkaUtils.createDirectStream[String, String](...)
ssc.start()
ssc.awaitTermination()
} catch {
case e: Exception => println(e)
}