1

I am trying to run a Python script in Spark. I am running Spark in client mode (i.e. single node) with a Python script that has some dependencies (e.g. pandas) installed via Conda. There are various resources which cover this usage case, for example:

Using those as an example I run Spark via the following command in the Spark bin directory, where /tmp/env.tar is the Conda environment packed by conda-pack:

export PYSPARK_PYTHON=./environment/bin/python
./spark-submit --archives=/tmp/env.tar#environment script.py

Spark throws the following exception:

java.io.IOException: Cannot run program "./environment/bin/python": error=2, No such file or directory

Why does this not work? I am curious also about the ./ in the Python path as it's not clear where Spark unpacks the tar file. I assumed I did not need to load the tar file into HDFS since this is all running on a single node (but perhaps I do for cluster mode?).

Boon
  • 1,073
  • 1
  • 16
  • 42

0 Answers0