8

How to use subgraph function to get a graph that would include only vertexes and edges from the specific connected component? Let's say I know the connected component id, the final goal is to create a new graph based on the connected component. I'd like to keep the vertex attributes from the original graph.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Oleg Baydakov
  • 81
  • 1
  • 2

2 Answers2

6

You have to join the graph with the component IDs to the original graph, filter (take the subgraph) by the component ID, and then discard the component ID.

import scala.reflect._
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib.ConnectedComponents

def getComponent[VD: ClassTag, ED: ClassTag](
    g: Graph[VD, ED], component: VertexId): Graph[VD, ED] = {
  val cc: Graph[VertexId, ED] = ConnectedComponents.run(g)
  // Join component ID to the original graph.
  val joined = g.outerJoinVertices(cc.vertices) {
    (vid, vd, cc) => (vd, cc)
  }
  // Filter by component ID.
  val filtered = joined.subgraph(vpred = {
    (vid, vdcc) => vdcc._2 == Some(component)
  })
  // Discard component IDs.
  filtered.mapVertices {
    (vid, vdcc) => vdcc._1
  }
}
Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • Thanks for this function! But, can we not assume that if you have the CC id thenyou have probably already constructed the CC, such that this can be pulled in as one of `getComponent`'s arguments saving the potentially expensive task of recomputing. I am curious as to why this function works, but when I did the steps manually putting a `Long` or whatever in as the `component` it fails. Must it be cast `Some` or `VertexId`? – SpmP Sep 22 '15 at 19:31
  • You're right — now that I look at it it's quite strange how we would have a component ID before even running `ConnectedComponents`. I guess I just wanted to include `ConnectedComponents` somewhere. I'll rework the code to be more sensible. For your other question: `VertexId` is an alias for `Long`. A plain old number should work there. What do you mean by "fails" here? Do you get an exception? – Daniel Darabos Sep 22 '15 at 19:36
  • Do you know how to count the number of connected components in a graph? – ELI Nov 08 '17 at 01:44
  • It should be `cc.vertices.values.distinct.count`. – Daniel Darabos Nov 10 '17 at 12:16
3

I take your question to be, given a VertexId in a source graph, create a new graph with the nodes and edges connected to this VertexId from the source graph.

Given that, here's what I would do:

val targetVertexId = ...
val graph = Graph(..., ...)
val newGraph = Graph(
  graph.vertices.filter{case (vid,attr) => vid == targetVertexId} ++
  graph.collectNeighbors(EdgeDirection.Either)
    .filter{ case (vid,arr) => vid == targetVertexId}
    .flatMap{ case (vid,arr) => arr},
  graph.edges
).subgraph(vpred = { case (vid,attr) => attr != null})

Couple of things to note:

You can change EdgeDirection.Either to EdgeDirection.In or EdgeDirection.Out as needed.

The .subgraph at the end removes all Vertices where the attribute is set to null. If the original val graph has Vertices with attributes set to null this isn't going to work. Otherwise this works, without having to know the Vertex attribute type in advance.

David Griffin
  • 13,677
  • 5
  • 47
  • 65