Pycharm Debug with pre-parameters Spark

Question

Here it is explained how to run scripts in Pycharm with arguments: Debugging with PyCharm terminal arguments

I would like to run my script as follows:

input1 file.py input2

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar arg1 arg2

How can I do this? Thanks!

score 2 · Accepted Answer · answered Dec 27 '16 at 10:52

I am not 100% sure I understand how you normally run the script but lets assume you have a script called script.py which you want to receive 2 arguments arg1, arg2 and when you run from the command line using spark-submit you have 2 options opt1 and opt2 run it as follows:

spark-submit --opt1 opt1 --opt2 opt2 script.py arg1 arg2

If I understand correctly in your case this is:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar file.py arg1 arg2

Let's also assume that everything runs when you do so from the command line (if not then make sure that runs first).

** Define environment variables **

The goal of this step is to enable running as follows:

python script.py arg1 arg2

To do so you need to define the proper environment variables:

PYTHONPATH

Should include the python and py4j definitions:

$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-XXX-src.zip

$SPARK_HOME is where you installed spark (e.g. /opt/spark). In windows you might have defined it as %SPARK_HOME% (or you can just put it directly).
The XXX in the py4j path depends on your version.
- For example for spark 2.0.1 this would be py4j-0.10.3-src.zip.
- For spark 1.6.1 I think this was py4j-0.9-src.zip but you should check.

PYSPARK_SUBMIT_ARGS

This tells spark how to load everything. It should include all arguments to spark-submit followed by "pyspark-shell" in the end. In your case this would probably have the following value:

--jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1 pyspark-shell

Configure the run configuration

Now you can configure this the same as any python script. Just make sure to have the arguments in the script parameters

Pycharm Debug with pre-parameters Spark

1 Answers1