0

I am trying to run the below code to create graphframe in pyspark which is setup on my local. But I am getting error. And I am using spark-2.4.0-bin-hadoop2.7 version.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#spark = SparkSession.builder.appName('fun').getOrCreate()
vertices = spark.createDataFrame([('1', 'Carter', 'Derrick', 50), 
                                  ('2', 'May', 'Derrick', 26),
                                 ('3', 'Mills', 'Jeff', 80),
                                  ('4', 'Hood', 'Robert', 65),
                                  ('5', 'Banks', 'Mike', 93),
                                 ('98', 'Berg', 'Tim', 28),
                                 ('99', 'Page', 'Allan', 16)],
                                 ['id', 'name', 'firstname', 'age'])
edges = spark.createDataFrame([('1', '2', 'friend'), 
                               ('2', '1', 'friend'),
                              ('3', '1', 'friend'),
                              ('1', '3', 'friend'),
                               ('2', '3', 'follows'),
                               ('3', '4', 'friend'),
                               ('4', '3', 'friend'),
                               ('5', '3', 'friend'),
                               ('3', '5', 'friend'),
                               ('4', '5', 'follows'),
                              ('98', '99', 'friend'),
                              ('99', '98', 'friend')],
                              ['src', 'dst', 'type'])
g = GraphFrame(vertices, edges)

I am getting the below error.

enter image description here

Akash
  • 359
  • 1
  • 7
  • 27

2 Answers2

1

The following seems to work for me.

  1. Download the .jar file from https://spark-packages.org/package/graphframes/graphframes
  2. Since I had pyspark running on Anaconda, I added the .jar file to that path, /anaconda3/lib/python3.7/site-packages/pyspark/jars/ along with the other .jar files.
  3. Then, the following script seems to work.
# Ref: https://stackoverflow.com/a/50404308/9331359
from pyspark import SparkContext
context = SparkContext()
context.addPyFile('/anaconda3/lib/python3.7/site-packages/pyspark/jars/graphframes-0.7.0-spark2.4-s_2.11.jar')
context


# Ref: https://stackoverflow.com/a/55430066/9331359
from pyspark.sql.session import SparkSession
spark = SparkSession(context)

from pyspark.sql.types import *
from graphframes import *
bkowshik
  • 13
  • 3
0

You can resolve error by implementing following steps:

1) download graphframes jar from below based on the spark version you are using ( e.g. 0.7.0-spark2.4-s_2.11 since you are using spark 2.4 version )

https://spark-packages.org/package/graphframes/graphframes

2) add downloaded graphframes jar to your spark jar e.g. $SPARK_HOME/jars

3) launch pyspark with arguments for the first time so that it downloads all the graphframe's jars dependencies:

e.g. in Windows machine , you can launch using command prompt

$SPARK_HOME/bin/pyspark --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

4) issue below command before you run graph commands from graphframes import *

Above steps will resolve your issue

Kaa
  • 154
  • 6
  • Thank you for reply. This is the error I am getting. Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:/C:/Users/Akash%20Jain/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) – Akash Jan 12 '19 at 10:14
  • any help on this ? What could be the reason. Why it is hitting my user folder. ? – Akash Jan 13 '19 at 06:07
  • Can you please paste your new code and the steps which you followed to resolve ? – Kaa Jan 17 '19 at 14:29
  • my code still remains the same.. its just that I have done all these steps whatever you mentioned and then I executed the above code. Its says jar does not exist at this location... – Akash Jan 18 '19 at 08:13
  • did you include below statement in your code as I dont see it in your original code ? " from graphframes import * " Also did you appropriately execute step no. 3 mentioned in the solution steps please? – Kaa Jan 21 '19 at 15:11
  • I did run from graphframes import *. I too suspect something wrong with my step 3 execution. But not sure how to debug. I did not get any error though while executing step 3. Any points on how to check whether it is done correctly or not ? May be a screenshot of after execution of step 3 will help.. – Akash Jan 21 '19 at 15:16
  • What value you used in $SPARK_HOME in step 3 ? It should be actual value – Kaa Jan 21 '19 at 17:15
  • @Dhineshkumar Not really. – Akash Nov 13 '19 at 06:17