-1

I have been searching in stackoverflow and other places for the error I am seeing now and tried a few "answers", none is working here (I will continue search though and update here):

I have a new Ubuntu and Anaconda3 is installed, Spark 2 is installed:

Anaconda3: /home/rxie/anaconda Spark2: /home/rxie/Downloads/spark

I am able to start up Jupyter Notebook, however, not able to create SparkSession:

from pyspark.conf import SparkConf

ModuleNotFoundError Traceback (most recent call last) in () ----> 1 from pyspark.conf import SparkConf

ModuleNotFoundError: No module named 'pyspark'

Here is my environments in the .bashrc:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export SPARK_HOME=/home/rxie/spark/
export SBT_HOME=/usr/share/sbt/bin/sbt-launch.jar
export SCALA_HOME=/usr/local/src/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
export PATH=$SPARK_HOME/bin:$PATH
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin

# added by Anaconda3 installer
export PATH="/home/rxie/anaconda3/bin:$PATH"
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

What's wrong with the import SparkConf in jupyter notebook?

It is greatly appreciated if anyone can shed me with any light, thank you very much.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Choix
  • 555
  • 1
  • 12
  • 28

3 Answers3

0

For some reason, jupyter doesn't works correctly when is installed with Anaconda. I was the same problem and I solved it reinstalling the jupyter package in the virtual environment.

In your virtual environment do:

pip install jupyter
  • Thanks. Re-install jupyter is not always acceptable in some situation. I believe in this case the blank value in the two variables is the culprit. – Choix Aug 22 '18 at 00:42
0

If you are in python, you need to initialize your spark session

import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip'))
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

Above is my code, you may need to find the corresponding libraries in your spark installation and replace the paths above.

If you lucky, you will see something like this

Python 2.7.13 |Anaconda, Inc.| (default, Sep 22 2017, 00:47:24)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.1-mapr-1803
      /_/

Using Python version 2.7.13 (default, Sep 22 2017 00:47:24)
SparkSession available as 'spark'.
>>> from pyspark.conf import SparkConf
>>> SparkConf
<class 'pyspark.conf.SparkConf'>
>>>
Christopher
  • 731
  • 6
  • 24
-1

With the final PATH be the following, the notebook starts working as expected:

$ echo $PATH
/usr/lib64/qt-.3/bin:/home/rxie/perl5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin

AND:

echo $PYSPARK_DRIVER_PYTHON
jupyter
echo $PYSPARK_DRIVER_PYTHON_OPTS
notebook
Choix
  • 555
  • 1
  • 12
  • 28
  • Well, nothing on your PATH there is Spark specific, so that doesn't seem to be the solution – OneCricketeer Aug 21 '18 at 21:07
  • would it be because of the two variables? they were blank before – Choix Aug 22 '18 at 00:39
  • No, everything after `/usr/lib64/qt-.3/bin:/home/rxie/perl5/bin` is default OS path settings. And those two things are QT and Perl, so nothing on your PATH is Spark related. – OneCricketeer Aug 22 '18 at 00:43