Questions tagged [spark-graphx]

GraphX is a component in Apache Spark for graphs and graph-parallel computation

GraphX is a component in Apache Spark for graphs and graph-parallel computation.

At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge.

To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API.

In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.

487 questions
22
votes
5 answers

Neo4j or GraphX / Giraph what to choose?

Just started my excursion to graph processing methods and tools. What we basically do - count some standard metrics like pagerank, clustering coefficient, triangle count, diameter, connectivity etc. In the past was happy with Octave, but when we…
Roman
  • 257
  • 1
  • 2
  • 4
17
votes
6 answers

Timeout Exception in Apache-Spark during program Execution

I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a large number of times. I am currently trying to call this spark method for 100,000 times using a for loop. The code exits with the following…
Yasir Arfat
  • 645
  • 1
  • 8
  • 21
16
votes
5 answers

Spark - Container is running beyond physical memory limits

I have a cluster of two worker nodes. Worker_Node_1 - 64GB RAM Worker_Node_2 - 32GB RAM Background Summery : I am trying to execute spark-submit on yarn-cluster to run Pregel on a Graph to calculate the shortest path distances from one source…
mn0102
  • 839
  • 1
  • 12
  • 25
10
votes
1 answer

Implement a directed Graph as an undirected graph using GraphX

I have following directed graph as given by the nodes and edges below. Nodes 1,2,3,4,5 Edges (1,2),(1,3),(1,4),(2,5),(3,4),(3,5),(4,5) How do I convert this directed graph to undirected graph Do I have to convert using built-in method. If…
Yasir Arfat
  • 645
  • 1
  • 8
  • 21
10
votes
2 answers

Implementing topological sort in Spark GraphX

I am trying to implement topological sort using Spark's GraphX library. This is the code I've written so far: MyObject.scala import java.util.ArrayList import scala.collection.mutable.Queue import org.apache.spark.SparkConf import…
10
votes
4 answers

Graphx Visualization

I am looking for a way to visualize the graph constructed in Spark's Graphx. As far as I know Graphx doesn't have any visualization methods so I need to export the data from Graphx to another graph library, but I am stuck here. I ran into this…
Saygın Doğu
  • 305
  • 1
  • 4
  • 17
9
votes
2 answers

Spark: What is the time complexity of the connected components algorithm used in GraphX?

GraphX comes with an algorithm for finding connected components of a graph. I did not find a statement about the complexity of their implementation. Generally, finding connected components can be done in linear time, for instance by a breadth-first…
9
votes
1 answer

Get all the nodes connected to a node in Apache Spark GraphX

Suppose we have got the input in Apache GraphX as : Vertex RDD: val vertexArray = Array( (1L, "Alice"), (2L, "Bob"), (3L, "Charlie"), (4L, "David"), (5L, "Ed"), (6L, "Fran") ) Edge RDD: val edgeArray = Array( Edge(1L, 2L, 1), …
Ajay Gupta
  • 3,192
  • 1
  • 22
  • 30
8
votes
0 answers

Apache Spark GraphX: java.lang.ArrayIndexOutOfBoundsException: -1

We have hit a bug with GraphX when calling the connectedComponents function, where it errors with the following error java.lang.ArrayIndexOutOfBoundsException: -1 I've found this bug report: https://issues.apache.org/jira/browse/SPARK-5480 Has…
Andy Long
  • 706
  • 5
  • 15
8
votes
1 answer

no valid constructor on spark

This is my code: class FNNode(val name: String) case class Ingredient(override val name: String, category: String) extends FNNode(name) val ingredients: RDD[(VertexId, FNNode)] = sc.textFile(PATH+"ingr_info.tsv"). filter(!…
elelias
  • 4,552
  • 5
  • 30
  • 45
8
votes
2 answers

Apache Spark GraphX connected components

How to use subgraph function to get a graph that would include only vertexes and edges from the specific connected component? Let's say I know the connected component id, the final goal is to create a new graph based on the connected component. I'd…
Oleg Baydakov
  • 81
  • 1
  • 2
8
votes
2 answers

How to get SSSP actual path by apache spark graphX?

I have ran the single source shortest path (SSSP) example on spark site as follows: graphx-SSSP pregel example Code(scala): object Pregel_SSSP { def main(args: Array[String]) { val sc = new SparkContext("local", "Allen Pregel Test",…
AllenChen
  • 81
  • 1
  • 4
7
votes
1 answer

Spark GraphX Aggregation Summation

I'm trying to compute the sum of node values in a spark graphx graph. In short the graph is a tree and the top node (root) should sum all children and their children. My graph is actually a tree that looks like this and the expected summed value…
will
  • 121
  • 1
  • 1
  • 9
7
votes
2 answers

Spark: graphx api OOM errors after unpersist useless RDDs

I have met an Out Of Memeory error with unknown reasons, I have released the useless RDDs immediately, but after several round of loop, OOM error still come out. My code is as following: // single source shortest path def…
bourneli
  • 2,172
  • 4
  • 24
  • 40
7
votes
2 answers

Spark - GraphX - scaling connected components

I am trying to use connected components but having issue with scaling. My Here is what I have - // get vertices val vertices = stage_2.flatMap(x => GraphUtil.getVertices(x)).cache // get edges val edges = stage_2.map(x =>…
Shirish Kumar
  • 1,532
  • 17
  • 23
1
2 3
32 33