3

I had installed pyspark in a python virtualenv. I have also installed jupyterlab which was newly released http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html in the virtualenv. I was unable to fire pyspark within a jupyter-notebook in such a way that I have the SparkContext variable available.

Pranay Aryal
  • 5,208
  • 4
  • 30
  • 41

3 Answers3

5

First fire the virtualenv

source venv/bin/activate
export SPARK_HOME={path_to_venv}/lib/python2.7/site-packages/pyspark
export PYSPARK_DRIVER_PYTHON=jupyter-lab

Before this I hope you have done:pip install pyspark and pip install jupyterlab inside your virtualenv

To check, once your jupyterlab is open, type sc in a box in the jupyterlab and you should have the SparkContext object available and the output should be this:

SparkContext
Spark UI
Version
v2.2.1
Master
local[*]
AppName
PySparkShell
Pranay Aryal
  • 5,208
  • 4
  • 30
  • 41
  • 1
    For me, Spark 3.2.0 + Python 3.9, this works: `SPARK_HOME=.venv/lib/python3.9/site-packages/pyspark PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark` – ospider Nov 26 '21 at 12:29
0

You need to export your $PYSPARK_PYTHON with your virtualenv

export PYSPARK_PYTHON={path/to/your/virtualenv}/bin/python

That solved my case.

moctarjallo
  • 1,479
  • 1
  • 16
  • 33
0

In my case, working with windows, python 3.7.4 and spark 3.1.1., the problem was that pyspark was looking for python3.exe that did not exist. I made a copy of venv/Scripts/python.exe and renamed venv/Scripts/python3.exe