Questions tagged [graphframes]

DataFrame based graph library for Apache Spark

GraphFrames is DataFrame base alternative to core GraphX with cross language support:

External resources:

Related tags:

, , .

186 questions
19
votes
5 answers

Unable to run a basic GraphFrames example

Trying to run a simple GraphFrame example using pyspark. spark version : 2.0 graphframe version : 0.2.0 I am able to import graphframes in Jupyter: from graphframes import GraphFrame GraphFrame graphframes.graphframe.GraphFrame I get this error…
roopalgarg
  • 429
  • 1
  • 6
  • 19
9
votes
1 answer

Build a hierarchy from a relational data-set using Pyspark

I am new to Python and stuck with building a hierarchy out of a relational dataset. It would be of immense help if someone has an idea on how to proceed with this. I have a relational data-set with data like _currentnode, childnode_ root, …
Vardhan
  • 402
  • 5
  • 13
8
votes
4 answers

No module named graphframes Jupyter Notebook

I'm following this installation guide but have the following problem with using graphframes from pyspark import SparkContext sc =SparkContext() !pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 from graphframes import…
Daniel Chepenko
  • 2,229
  • 7
  • 30
  • 56
8
votes
2 answers

Partitioning with Spark Graphframes

I'm working with a largish (?) graph (60 million vertices and 9.5 billion edges) using Spark Graphframes. The underlying data is not large - the vertices take about 500mb on disk and the edges are about 40gb. My containers are frequently shutting…
John
  • 1,167
  • 1
  • 16
  • 33
7
votes
4 answers

Proper subgraphing of a PySpark GraphFrame

graphframes is a network analysis tool based on PySpark DataFrames. The following code is a modified version of the tutorial subgraphing example: from graphframes.examples import Graphs import graphframes g = Graphs(sqlContext).friends() # Get…
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170
6
votes
0 answers

Graphframes: py4j.protocol.Py4JJavaError: An error occurred while calling o100.createGraph

Im running a simple EMR cluster with Spark 2.4.4 and I want to use graphframes v0.7 to run the following code: from pyspark import * from pyspark.sql import * from graphframes import * sc=…
6
votes
0 answers

Find shortest path in a weighted digraph with GraphFrames Spark

The graphFrames package of spark is great. I can find the shortest path from "a" to "d" with the command val results = g.shortestPaths.landmarks(Seq("a", "d")).run() but what how can I define a weighted graph and compute shortest path between two…
rahram
  • 560
  • 1
  • 7
  • 21
6
votes
1 answer

Convert GraphFrames ShortestPath Map into DataFrame rows in PySpark

I am trying to find the most efficient way to take the Map output from the GraphFrames function shortestPaths and flatten each vertex's distances map into individual rows in a new DataFrame. I've been able to do it very clumsily by pulling the…
6
votes
1 answer

Finding connected components of a particular node instead of the whole graph (GraphFrame/GraphX)

I have created a GraphFrame in Spark and the graph currently looks as following: Basically, there will be lot of such subgraphs where each of these subgraphs will be disconnected to each other. Given a particular node ID I want to find all the…
sjishan
  • 3,392
  • 9
  • 29
  • 53
5
votes
1 answer

Install package Graphframes using spark-shell

I am trying to install PySpark package Graphframes using spark-shell : pyspark --packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 However, there is any error like this in the terminal: root@hpcc:~# pyspark --packages…
5
votes
4 answers

PySpark packages installation on kubernetes with Spark-Submit: ivy-cache file not found error

I am fighting it the whole day. I am able to install and to use a package (graphframes) with spark shell or a connected Jupiter notebook, but I would like to move it to the kubernetes based spark environment with spark-submit. My spark version:…
kostjaigin
  • 125
  • 2
  • 8
5
votes
1 answer

GraphFrames: Merge edge nodes with similar column values

tl;dr: How do you simplify a graph, removing edge nodes with identical name values? I have a graph defined as follows: import graphframes from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() vertices =…
Julio
  • 2,261
  • 4
  • 30
  • 56
5
votes
2 answers

Using graphframes in Google Colab

How do I install graphframes on Google colab? I tried !pip install graphframes but received error An error occurred while calling o503.loadClass.: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI when I call g = GraphFrame(v,e).…
5
votes
0 answers

Graphframes: BFS between two lists of vertices in spark graphframes

My aim is to find whether the max path length between two vertices is <= 4. I have a graph dataframe and a test file of the below format. I am trying to get the output column(OP) from bfs function of graph dataframes. Col1, Col2, OP a1, a4, …
5
votes
3 answers

How can I use graphframes with pyspark on AWS EMR?

I'm trying to use the graphframes package in pyspark in Jupyter Notebook (using Sagemaker and sparkmagic) on AWS EMR. I've tried adding a configuration option when creating the EMR cluster in the AWS console: [{"classification":"spark-defaults",…
Bob Swain
  • 3,052
  • 3
  • 17
  • 28
1
2 3
12 13