2

I have a parent Graph that I want to filter into multiple subgraphs, so I can apply a function to each subgraph and extract some data. My code looks like this:

val myTerms = <RDD of terms I want to use to filter the graph>
val myVertices = ...
val myEdges = ...
val myGraph = Graph(myVertices, myEdges)

val myResults : RDD[(<Tuple>)] = myTerms.map { x => mySubgraphFunction(myGraph, x) }

Where mySubgraphFunction is a function that creates a subgraph, performs a calculation, and returns a tuple of result data.

When I run this, I get a Java null pointer exception at the point that mySubgraphFunction calls GraphX.subgraph. If I call collect on the RDD of terms, I can get this to work (also added persist on the RDDs for performance):

val myTerms = <RDD of terms I want to use to filter the graph>
val myVertices = <read RDD>.persist(StorageLevel.MEMORY_ONLY_SER)
val myEdges = <read RDD>.persist(StorageLevel.MEMORY_ONLY_SER)
val myGraph = Graph(myVertices, myEdges)

val myResults : Array[(<Tuple>)] = myTerms.collect().map { x =>
                 mySubgraphFunction(myGraph, x) }

Is there a way to get this to work where I don't have to call collect() (i.e. make this a distributed operation)? I'm creating ~1k subgraphs and the performance is slow.

John
  • 1,167
  • 1
  • 16
  • 33
  • There is no way to make it work. You cannot start action or transformation inside another action or transformation. Poor performance you see is most likely related to the fact that with 1K terms you have to start >= 1K jobs and it is simply expensive. Without `mySubgraphFunction` is hard to say anything more, but I assume that you already process your data in parallel. If you have a lot of resources compared to amount of data you can start multiple jobs asynchronously (for example using [futures](http://stackoverflow.com/a/31916657/1560062)). – zero323 Sep 02 '15 at 17:45
  • @zero323 Is it possible that graphframe vertex filter might help solve this? http://graphframes.github.io/user-guide.html#subgraphs – J Maurer Aug 05 '16 at 18:24

0 Answers0