2

I know this question has a duplicate , but my use case is a little specific. I want to run my Spark job (compiled to a .jar) on an EMR (via Spark submit) and give 2 options like this:

spark-submit --master yarn --deploy-mode cluster <rest of command>

To achieve this, I wrote the code like this:

val sc = new SparkContext(new SparkConf())
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

However this gives the error during building the jar:

org.apache.spark.SparkException: A master URL must be set in your configuration

So what's a workaround? How do I set these 2 variables in code so that the master and deploy mode options are taken up while submitting; yet I should be able to use the variables sc and spark in my code (e.g:- val x = spark.read())

1 Answers1

0

You could simply access command-line arguments simply as below and pass as many values as you want.

val spark = SparkSession.builder().appName("Test App")
  .master(args(0))
  .getOrCreate()


 spark-submit --master yarn --deploy-mode cluster master-url

If you need something more fancy command-line parser then you can take a look here https://github.com/scopt/scopt

koiralo
  • 22,594
  • 6
  • 51
  • 72