1

I am running a Jupyter notebook public server as in this tutorial : http://jupyter-notebook.readthedocs.io/en/stable/public_server.html

I want to use pyspark-2.2.1 with this server. I pip-installed py4j and downloaded spark-2.2.1 from the repository.

Locally, i added in my .bashrc the command lines

export SPARK_HOME='/home/ubuntu/spark-2.2.1-bin-hadoop2.7'  
export PATH=$SPARK_HOME:$PATH  
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH

and everything works fine when i run python locally.

However, when using the notebook server, i cannot import pyspark, because the above commands have not been executed at jupyter notebook's startup.

I partly (and non elegantly) solved the issue by typing

import sys
sys.path.append("/home/ubuntu/spark-2.2.1-bin-hadoop2.7/python")

in the first cell of my notebook. But

from pyspark import SparkContext
sc = SparkContext()
myrdd = sc.textFile('exemple.txt')
myrdd.collect()  # Everything works find util here
words = myrdd.map(lambda x:x.split())
words.collect()

returns the error

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.io.IOException: Cannot run program "python": error=2, No such file or directory

Any idea how i can set the correct paths (either manually or at startup) ?

Thanks

  • Use a Jupyter kernel; see detailed answer in [Configuring Spark to work with Jupyter Notebook and Anaconda](https://stackoverflow.com/questions/47824131/configuring-spark-to-work-with-jupyter-notebook-and-anaconda/47870277#47870277) – desertnaut Feb 11 '18 at 10:31
  • 1
    Possible duplicate of [Configuring Spark to work with Jupyter Notebook and Anaconda](https://stackoverflow.com/questions/47824131/configuring-spark-to-work-with-jupyter-notebook-and-anaconda) – desertnaut Feb 11 '18 at 10:32
  • Thank you. Finally solved the issue by adding the following line to my /etc/systemd/system/jupyter.service : ExecStart=/bin/bash -c "PATH=[all paths i need]" – Salva Martini Feb 13 '18 at 10:06

0 Answers0