Issue #1 - Fix the Spark Driver
It appears the culprit is the find-spark-home.cmd
script that lives in ...\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\python-scripts
It's looking for python3
by default
rem Default to standard python3 interpreter unless told otherwise
set PYTHON_RUNNER=python3
rem If PYSPARK_DRIVER_PYTHON is set, it overwrites the python version
if not "x%PYSPARK_DRIVER_PYTHON%"=="x" (
set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
)
...whereas you (and I) have our Anaconda python exe (v3.10) catalogued under python
(If you have another version installed independently of Anaconda, as I have, you can call that too. For example setting the variables below to calling py
instead of python
runs my pyspark under python v3.11)
Run that script as is, and you'll get exactly that error you reported
Setting the override (in Anaconda HOME) in a conda shell worked for me and I was able to start pyspark (from the same directory). See below.
Issue #2- Fix the Spark Worker
But then, I also needed to override the Python executable/alias used by Spark! Different variable required to override (Eye-roll) - hence the second environmental variable below
[Perhaps different environmental variables enable you to provide different versions for the Driver and Workers? ...But why?]
Issue #3
And then pyspark couldn't find my pyspark
package
ModuleNotFoundError: No module named 'pyspark.context'
You can set an environmental variable to add additional site-packages
directories too. Hence the third environmental variable below.
Solution
(base) C:\Users\<user>\anaconda3> set PYSPARK_DRIVER_PYTHON=python
(base) C:\Users\<user>\anaconda3> set PYSPARK_PYTHON=python
(base) C:\Users\<user>\anaconda3> set PYTHONPATH=C:\Users\<user>\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\site-packages
(base) C:\Users\<user>\anaconda3> pyspark
Once you have it working successfully from the conda shell, I suggest you try it from a Jupyter notebook
%set_env PYSPARK_PYTHON=python
%set_env PYSPARK_DRIVER_PYTHON=python
%set_env PYTHONPATH=C:\Users\<user>\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\site-packages