The Pyspark always use the system‘s python

Question

We know that a system has two Python:

①system's python

/usr/bin/python

②user's python

~/anaconda3/envs/Python3.6/bin/python3

Now I have a cluster with my Desktop(master) and Laptop(slave).

It's OK for different mode of PysparkShell if I set like this:

export PYSPARK_PYTHON=~/anaconda3/envs/Python3.6/bin/python3

export PYSPARK_DRIVER_PYTHON=~/anaconda3/envs/Python3.6/bin/python3 for both two nodes' ~/.bashrc

However,I want to configure it with jupyter notebook.So I set like this in each node's

~/.bashrc

export PYSPARK_PYTHON=~/anaconda3/envs/Python3.6/bin/python3

export PYSPARK_DRIVER_PYTHON="jupyter"

export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

then I get the log

My Spark version is:

spark-3.0.0-preview2-bin-hadoop3.2

I have read all the answers in

environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

and

different version 2.7 than that in driver 3.6 in jupyter/all-spark-notebook

But no luck.

I guess slave's python2.7 is from system's python.not from anaconda's python.

How to force spark's slave node to use anaconda's python?

Thanks~!

score 0 · Answer 1 · answered Jul 28 '20 at 19:59

Jupiyter is looking for ipython, you probably only have ipython installed in your system python.

In order to use jupyter in different python version. You need to use python version manager (pyenv), and python environment manager(virtualenv), together you can choose which version of python you are going to use and which environment you are going to install jupyter, and fully isolated python versions and packages.

Install ipykernel in your chosen python environment and install jupyter.

After you finish above step. You need to make sure that the Spark worker will switch to your chosen python version and environment every time Spark ReourceManager launches a worker executor. In order to swtich python version and environment when the Spark worker executor, you need to make sure that a little script ran right after the Spark Resource Manager ssh into worker:

go to the python environment directory
source 'whatever/bin/activate'

After you have done above steps, you should have chosen python version and jupyter ran by Spark worker executor.

Thanks for your replies...why not anaconda?and my virtual environment is OK — , Jul 29 '20 at 00:55
you can use anaconda for sure. can you check the following? 1. ssh into your account, and run the following: 2. which ipython — stanleywxc, Jul 29 '20 at 04:42

The Pyspark always use the system‘s python

1 Answers1