5

How do I install graphframes on Google colab?

I tried !pip install graphframes but received error An error occurred while calling o503.loadClass.: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI when I call g = GraphFrame(v,e). I am using Spark2.4.

I also tried ! pyspark --packages graphframes:graphframes:0.8.0-spark2.4-s_2.11

All other sources seem to not work on the Colab platform.

Sample_friend
  • 117
  • 2
  • 8

2 Answers2

3

I had the same problem. You just have to upload

 graphframes-0.8.0-spark2.4-s_2.11.jar

to

/usr/local/lib/python3.6/dist-packages/pyspark/jars

on your Google Colab after you installed graphframe.You have to do this every time you start Colab.

You can download file from your notebook like this:

!curl -L -o "/usr/local/lib/python3.6/dist-packages/pyspark/jars/graphframes-0.8.0-spark2.4-s_2.11.jar" http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.8.0-spark2.4-s_2.11/graphframes-0.8.0-spark2.4-s_2.11.jar
dboberic
  • 31
  • 2
  • Slightly off tangent but would you recommend Graphframes or should I upgrade to Spark3 to use their new graph functionalities? – Sample_friend Apr 14 '20 at 04:33
  • Unfortunately, I am not familiar with new features of Spark 3 so I cannot advice you anything. As I can see Spark 3 will introduce Cypher query language from Neo4J. I know that Neo4J supports almost all algorithms as GraphFrame (path finding, centrality algorithms, community detection algorithms) so I assume that it will be in that new Spark Graph module. It is just question if there will be an API in Python because for GrpahX , Spark 2 has only API for Sala and Java. If the language is not so important to you, then I would chose Spark 3 because of better integration with other Spark modules. – dboberic Apr 14 '20 at 12:14
  • I'm getting "Warning: /usr/local/lib/python3.6/dist-packages/pyspark/jars/graphframes-0.8.0- Warning: spark2.4-s_2.11.jar: No such file or directory", i can import pyspark but there is no '/pyspark' folder under "/usr/local/lib/python3.6/dist-packages" – lightbox142 Jun 01 '20 at 05:14
2

Had the same problem. Solved it by that command:

spark = SparkSession.builder.master("local[*]").config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.4-s_2.11").getOrCreate()
shtuder
  • 31
  • 2