0

I have been stuck by pyspark for a few days. I am following the instructions here, I followed the instructions closely, installed anaconda, java, pyspark and findpyspark. Everything is good and smooth, no error, until I validate pyspark by running pyspark. It gave me error like this:

(my-env) C:\Users\chili>pyspark

Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases. The system cannot find the path specified. The system cannot find the path specified.

I add path to the path system veriable, the path is C:\ProgramData\Anaconda3

If I type python, it runs just fine.

(my-env) C:\Users\chili>python

Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information.

I have been searching long time, I tried the solution here It's not helpful to me.

I am really confused. Anybody could help me please? Thank you so much.

Lenny
  • 29
  • 1
  • 9

2 Answers2

0

If you are on Windows, it looks like this is not as straightforward as you can see on the link that you follow. The Windows file system is different from linux, so you have to use winutils.exe and manage path. You can see a tuto there. It sounds more complicated that running a VM with linux to me.

p-y
  • 3
  • 2
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/33556271) – pigrammer Jan 05 '23 at 02:00
0

Issue #1 - Fix the Spark Driver

It appears the culprit is the find-spark-home.cmd script that lives in ...\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\python-scripts

It's looking for python3 by default

rem Default to standard python3 interpreter unless told otherwise
set PYTHON_RUNNER=python3
rem If PYSPARK_DRIVER_PYTHON is set, it overwrites the python version
if not "x%PYSPARK_DRIVER_PYTHON%"=="x" (
  set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
)

...whereas you (and I) have our Anaconda python exe (v3.10) catalogued under python (If you have another version installed independently of Anaconda, as I have, you can call that too. For example setting the variables below to calling py instead of python runs my pyspark under python v3.11)

Run that script as is, and you'll get exactly that error you reported

Setting the override (in Anaconda HOME) in a conda shell worked for me and I was able to start pyspark (from the same directory). See below.

Issue #2- Fix the Spark Worker

But then, I also needed to override the Python executable/alias used by Spark! Different variable required to override (Eye-roll) - hence the second environmental variable below

[Perhaps different environmental variables enable you to provide different versions for the Driver and Workers? ...But why?]

Issue #3

And then pyspark couldn't find my pyspark package

ModuleNotFoundError: No module named 'pyspark.context'

You can set an environmental variable to add additional site-packages directories too. Hence the third environmental variable below.

Solution

(base) C:\Users\<user>\anaconda3> set PYSPARK_DRIVER_PYTHON=python
(base) C:\Users\<user>\anaconda3> set PYSPARK_PYTHON=python
(base) C:\Users\<user>\anaconda3> set PYTHONPATH=C:\Users\<user>\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\site-packages
(base) C:\Users\<user>\anaconda3> pyspark

Once you have it working successfully from the conda shell, I suggest you try it from a Jupyter notebook

%set_env PYSPARK_PYTHON=python
%set_env PYSPARK_DRIVER_PYTHON=python
%set_env PYTHONPATH=C:\Users\<user>\anaconda3\pkgs\pyspark-3.4.0-pyhd8ed1ab_0\site-packages
  • For Python 3.9 and [Py]Spark 3.2.1, I found that the setting PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON helped, but setting PYTHONPATH threw me for a multi-week loop. It is documented [here](https://stackoverflow.com/questions/76650431), and in particular, [here](https://stackoverflow.com/a/76752023/2153235). – user2153235 Jul 24 '23 at 18:54