10

Are there pros/cons, or maybe different use cases for using spark-submit to submit a python script vs. simply running a .py file with the python executable (and importing SparkSession), like this?

from pyspark.sql import SparkSession
spk = SparkSession.builder.master(master).getOrCreate()

Basically, are there any differences running the script via python and not spark-submit.

Luke W
  • 8,276
  • 5
  • 44
  • 36
  • Possible duplicate of [What is the difference between spark-submit and pyspark?](https://stackoverflow.com/questions/26726780/what-is-the-difference-between-spark-submit-and-pyspark) – vmg Jun 01 '17 at 15:59
  • pyspark runs inside a spark shell, yeah? in this case, i just want to run the script via ```python``` and not spark-submit. – Luke W Jun 01 '17 at 16:03

1 Answers1

9

spark-submit is mostly a convenience method. It allows you to set all desired configuration, environment variables, and other options on submit.

It also allows you to set JVM options, which cannot be set on the running virtual machine. Since JVM is initialized once Spark configuration is created, it is not possible to do the same from the running Python process.

Luke W
  • 8,276
  • 5
  • 44
  • 36
user8098908
  • 171
  • 1
  • after running side-by-side, it also appears that with spark-submit, logging is more verbose by default, and spark-submit also handles cleanup chores, both on failure and success. – Luke W Jul 26 '17 at 20:25