2

I have gone through the spark documentation for configuration (Here). Still i have this doubt. I am kind of newbie to spark. So please clarify me or route me to the correct reference.

I want to know the priority or order of the spark properties mentioned in these locations while executing the job.

  1. Spark Program

    val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep")

  2. spark-submit

    ./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false

  3. Spark-env.sh

  4. Spark-defaults.conf.

As per the spark documentation, SparkConf object parameters will take first priority and then spark-submit. Next is spark-defaults.conf. I am bit confused there. Why we have two files spark-env.sh and spark-defaults.conf

David Ranieri
  • 39,972
  • 7
  • 52
  • 94
srinivas amara
  • 155
  • 1
  • 10
  • 2
    Possible duplicate of [Add jars to a Spark Job - spark-submit](http://stackoverflow.com/questions/37132559/add-jars-to-a-spark-job-spark-submit) – Yuval Itzchakov Feb 01 '17 at 08:51
  • @YuvalItzchakov Thanks for the quick answer. I have updated it. Now i have updated it. Just clarify me about that small difference. Thank You. – srinivas amara Feb 01 '17 at 09:26

1 Answers1

2

As can be seen in the documentation (especially toward the end of the paragraph):

Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file.

So the order is: 4,2,1

Spark-env.sh defined environment variables rather than configuration. It is the same as adding environment variables (although this would override them) and in most cases this is aimed at backward compatibility so their effect is on a case by case basis (but always spark program would win).

Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56
  • Thanks. That makes sense to me. Thanks for your explanation. I read some where that "If you define environment variables in spark-env.sh, those values override any of the property values you set in spark-defaults.conf". Is this correct? – srinivas amara Feb 01 '17 at 09:29
  • In general, spark is moving away from using environment variables to set stuff. It still has many for backward compatibility and in some cases for how spark-submit/spark-shell/pyspark etc. behave. For those that set the behavior of spark-submit etc. they override the defaults but for those that are set in backward compatibility you will have to check yourself on a case by case basis. In general I would avoid using environment variables whenever possible (not always possible). – Assaf Mendelson Feb 01 '17 at 09:36