2

I am new to Spark and as I am learning this framework, I figured out that, to the best of my knowledge, there are two ways for running a spark application when written in Scala:

  1. Package the project into a JAR file, and then run it with the spark-submit script.
  2. Running the project directly with sbt run.

I am wondering what the difference between those two modes of execution could be, especially when running with sbt run can throw a java.lang.InterruptedException when it runs perfectly with spark-submit.

Thanks!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
YACINE GACI
  • 145
  • 2
  • 13
  • https://stackoverflow.com/questions/24238060/how-to-run-jar-generated-by-package-possibly-with-other-jars-under-lib – thebluephantom Dec 23 '18 at 10:09
  • `sbt run` will run your project on your local machine - it is good for local testing/debugging during development. `spark-submit` is the preferable way to run your project on a production environment, especially because it will handle the distribution of your program across the cluster. – Luis Miguel Mejía Suárez Dec 23 '18 at 18:54

2 Answers2

5

SBT is a build tool (that I like running on Linux) that does not necessarily imply Spark usage. It just so happens it is used like IntelliJ for Spark applications.

You can package and run an application in a single JVM under SBT Console, but not at scale. So, if you created a Spark application with dependencies indicated, SBT will compile the code with package and create a jar file with required dependencies etc. to run locally.

You can also use assembly option in SBT which creates an uber jar or fat jar with all dependencies contained in jar that you upload to your cluster and run via invoking spark-submit. So, again, if you created a Spark application with dependencies indicated, SBT will via assembly, compile the code and create an uber jar file with all required dependencies etc., except external file(s) that you need to ship to Workers, to run on your cluster (in general).

malana
  • 5,045
  • 3
  • 28
  • 41
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
  • Thanks! A very thorough explanation. So if I understood well, "sbt run" will execute the spark application locally even if I specify in the SparkConf object that I want to use another master (YARN or MESOS for example) other than local? – YACINE GACI Dec 26 '18 at 21:58
  • Never tried that. I am an SBT assembly man! I never set the configuration and mode inside the program. See https://spark.apache.org/docs/latest/configuration.html – thebluephantom Dec 26 '18 at 22:15
2

Spark Sbt and Spark-submit are 2 completely different Things

  1. sbt is build tool. If you have created a spark application, sbt will help you compile that code and create a jar file with required dependencies etc.
  2. Spark-submit is used to submit spark job to cluster manager. You may be using standalone, Mesos or Yarn as your cluster Manager. spark-submit will submit your job to cluster manager and your job will start on cluster.

Hope this helps.

Cheers!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Harjeet Kumar
  • 504
  • 2
  • 7