How to call python script in Spark?

Question

I have a metrics.py which calculates a graph.

I can call it in the terminal command line (python ./metrics.py -i [input] [output]).

I want to write a function in Spark. It calls the metrics.py script to run on the provide file path and collects the values that metrics.py prints out.

How can I do that?

score 4 · Accepted Answer · answered Jun 22 '16 at 11:18

In order to run metrics.py, you essentially ship it to all the executor nodes that run your Spark Job.

To do this, you either pass it via SparkContext -

sc = SparkContext(conf=conf, pyFiles=['path_to_metrics.py'])

or pass it later using the Spark Context's addPyFile method -

sc.addPyFile('path_to_metrics.py')

In either case, after that, do not forget to import metrics.py and then just call needed function that gives needed output.

import metrics
metrics.relevant_function()

Also make sure you have all the python libraries that are imported inside metrics.py installed on all executor nodes. Else, take care of them using the --py-files and --jars handles while spark-submitting your job.

How to call python script in Spark?

1 Answers1