The common way of running a spark job appears to be using spark-submit as below (source):
spark-submit --py-files pyfile.py,zipfile.zip main.py --arg1 val1
Being newer to spark, I wanted to know why this first method is preferred over running it from python (example):
python pyfile-that-uses-pyspark.py
The former method yields many more examples when googling the topic, but not explicitly stated reasons for it. In fact, here is another Stack Overflow question where one answer, repeated below, specifically tells the OP not to use the python method, but does not give a reason why.
dont run your py file as: python filename.py instead use: spark-submit filename.py
Can someone provide insight?