0

I'm trying to install graphframes package following some instructions I have already read.

My first attempt was to do this in the command line:

 pyspark--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11

This works perfectly and the download was successfully done in the machine.

However, when I try to import the package in my Jupyter notebook, it displays the error:

can't find module 'graphframes'

My first attempt is to copy the package folder /graphframes to the /site-packages, but I can not make it with a simple cp command.

I'm quite new using spark and I'm sure I'm missing some parts of the configuration...

Could you please help me?

Also
  • 101
  • 1
  • 2
  • 6
  • Possible duplicate of [No module named graphframes Jupyter Notebook](https://stackoverflow.com/questions/50286139/no-module-named-graphframes-jupyter-notebook) – Sida Zhou Mar 02 '19 at 02:21

2 Answers2

0

This was what worked for me.

Extract the contents of the graphframes-xxx-xxx-xxx.jar file. You should get something like

graphframes
 | -- examples
       |-- ...
 | -- __init__.py
 | -- ...

Zip up the entire folder (not just the contents) and name it whatever you want. We'll just call it graphframes.zip.

Then, run the pyspark shell with

pyspark --py-files graphframes.zip \
    --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11

You may need to do

sc.addPyFile('graphframes.zip')

before

import graphframes
absolutelydevastated
  • 1,657
  • 1
  • 11
  • 28
0

The simplest way is to start jupyter with pyspark and graphframes is to start jupyter out from pyspark.

Just open your terminal and set the two environment variables and start pyspark with the graphframes package

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11

the advantage of this is also that if you later on want to run your code via spark-submit you can use the same start command

Alex Ortner
  • 1,097
  • 8
  • 24