9

Suppose we have got the input in Apache GraphX as :

Vertex RDD:

val vertexArray = Array(
  (1L, "Alice"),
  (2L, "Bob"),
  (3L, "Charlie"),
  (4L, "David"),
  (5L, "Ed"),
  (6L, "Fran")
)

Edge RDD:

val edgeArray = Array(
  Edge(1L, 2L, 1),
  Edge(2L, 3L, 1),
  Edge(3L, 4L, 1),
  Edge(5L, 6L, 1)
)

I need all the components connected to a node in Apache Spark GraphX

1,[1,2,3,4]
5,[5,6]
zero323
  • 322,348
  • 103
  • 959
  • 935
Ajay Gupta
  • 3,192
  • 1
  • 22
  • 30
  • OK, so we understand what you need. What have you tried? Or are you expecting SO to write your code for you? – The Archetypal Paul Sep 16 '15 at 06:54
  • I do not expect the code but just basic outline for it.And for the question if it is required to write the stuff I have tried I think it will make the question a bit messy and not upto the point. Have seen the reference material for the Spark GraphX but was not able to get the solution for it. – Ajay Gupta Sep 16 '15 at 07:03
  • Also there's `collectNeighbours` which seemingly does what you need: http://spark.apache.org/docs/latest/graphx-programming-guide.html#collecting-neighbors – dmitry Sep 16 '15 at 09:02
  • collectNeighbors will give the information of node -> adjacent to that node list and collectNeighborsId will give just the ID of node so that wont help me for getting all the connected components – Ajay Gupta Sep 16 '15 at 09:15
  • Output of collectNeighbors : 4 -> (3,Charlie),1 -> (2,Bob),6 -> (5,Ed),3 -> (2,Bob),(4,David),5 -> (6,Fran),2 -> (1,Alice),(3,Charlie) – Ajay Gupta Sep 16 '15 at 09:23
  • You can achieve the goal manipulating by vertex and edge RDDS, the plan is the following: join vertex and edge RDD on vertex id, them map to other vertex id and finally group by key. As you are treating your graph as non-directional you may need additional manipulations, for example before joining you need to union your original edges with reversed edges (by the way you can do this and use `collectNeighbours`). – dmitry Sep 16 '15 at 10:27

1 Answers1

11

You can use ConnectedComponents which returns

a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.

and reshape results

graph.connectedComponents.vertices.map(_.swap).groupByKey
zero323
  • 322,348
  • 103
  • 959
  • 935
  • If instead the graph was 6->5, 4->3->3->1, this would produce the wrong result, I think. It would still produce the same result and instead it should be (6, [5,6], 4, [1,2,3,4])? – The Archetypal Paul Sep 16 '15 at 07:14
  • These are not strongly connected components and choice of the label is arbitrary. Using lowest id makes sense so I don't think there is a problem here. – zero323 Sep 16 '15 at 07:26
  • If the label is arbitary, yes, agreed. If the OP wanted the start of the subgraph, then there's an issue. But only the OP knows this. – The Archetypal Paul Sep 16 '15 at 07:35
  • Connected Components will find subgraphs, where each vertex has path to each other vertex. In fact it may be the whole initial graph if it is strongly connected. – dmitry Sep 16 '15 at 10:31
  • you don't answer on his question – gtzinos May 10 '18 at 23:32
  • how to implement this in Java? – shogitai Dec 30 '20 at 16:19