I want to run graphframes
with pyspark
.
I found this answer and follow its instruction but it doesn't work.
This is my code hello_spark.py
:
import pyspark
conf = pyspark.SparkConf().set("spark.driver.host", "127.0.0.1")
sc = pyspark.SparkContext(master="local", appName="myAppName", conf=conf)
sc.addPyFile("/opt/spark/jars/spark-graphx_2.12-3.0.2.jar")
from graphframes import *
When I run with this command:
spark-submit hello_spark.py
It returns this error:
from graphframes import *
ModuleNotFoundError: No module named 'graphframes'
This is my .bashrc
config:
# For Spark setup
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_PYTHON=/usr/bin/python3
export SPARK_LOCAL_IP=localhost
export SPARK_OPTS="--packages graphframes:graphframes:0.8.1-spark3.0-s_2.12"
My version of spark: 3.0.2
, scala: 2.12.10
.
I installed graphframes
with this command:
pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12
Does anyone know how to fix this? Thanks.