0

Here it is explained how to run scripts in Pycharm with arguments: Debugging with PyCharm terminal arguments

I would like to run my script as follows:

input1 file.py input2

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar arg1 arg2

How can I do this? Thanks!

Community
  • 1
  • 1
João
  • 177
  • 4
  • 15

1 Answers1

2

I am not 100% sure I understand how you normally run the script but lets assume you have a script called script.py which you want to receive 2 arguments arg1, arg2 and when you run from the command line using spark-submit you have 2 options opt1 and opt2 run it as follows:

spark-submit --opt1 opt1 --opt2 opt2 script.py arg1 arg2 

If I understand correctly in your case this is:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar file.py arg1 arg2

Let's also assume that everything runs when you do so from the command line (if not then make sure that runs first).

** Define environment variables **

The goal of this step is to enable running as follows:

python script.py arg1 arg2

To do so you need to define the proper environment variables:

PYTHONPATH

Should include the python and py4j definitions:

$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-XXX-src.zip
  • $SPARK_HOME is where you installed spark (e.g. /opt/spark). In windows you might have defined it as %SPARK_HOME% (or you can just put it directly).
  • The XXX in the py4j path depends on your version.
    • For example for spark 2.0.1 this would be py4j-0.10.3-src.zip.
    • For spark 1.6.1 I think this was py4j-0.9-src.zip but you should check.

PYSPARK_SUBMIT_ARGS

This tells spark how to load everything. It should include all arguments to spark-submit followed by "pyspark-shell" in the end. In your case this would probably have the following value:

--jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1 pyspark-shell

Configure the run configuration

Now you can configure this the same as any python script. Just make sure to have the arguments in the script parameters

Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56