I'm using Graphx on Spark for some experiment, and the current step is to get a subgraph of a generated graph. I've checked the the original graph has been generated successfully, not only the lazy lineage goes well but when I try graph.vertices.first()
the result is correctly displayed. Now my subgraph code is:
val reg = "(\\d*)11".r
val graphUSA_subgraph = graphUSA.subgraph(
vpred = (id, user) =>{
(id.toString() match{
case reg(x) => true
case _ => false
})
}
)
graphUSA_subgraph.vertices.first()
I meant to get a subgraph only contain nodes whose index ends with "11". I've check the Boolean
block in vpred = (id, user) => Boolean
and the logic is correct. What confuses me is when I ran the code in spark shell it raised an Error, and log is as follows:
Exception in task * in stage *...
java.io.InvalidClassException:...
unable to create instance
at java.io.ObjectInputStream. ...
...
Caused by: org.apache.spark.SparkException: Only one SparkContext may be running in this JVM ... The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:123)
The error is not caused by Graph.subgraph()
itself, because when I ran a simpler version:
val graph_subgraph_1 = graph.subgraph{
vpred = (id, user) => id.toString.endsWith("00")
}
graph_subgraph_1.vertices.first()
Everything went fine.
And then I tried another version which doesn't refer to the reg
outside Graph class:
val graphUSA_subgraph_0 = graphUSA.subgraph(
vpred = (id, user) =>{
id.toString().drop(id.toString().length() -2) match{
case "11" => true
case _ => false
}
}
)
graphUSA_subgraph_0.vertices.first()
Everything went fine too.
I'm wondering in which step a new SparkContext
is implicitly generated in the pipeline. And it seems quite possible that referring to some val
(regs
) outside function has caused it.
I've been struggling on this block for quite some time, and would be grateful if anyone could shed some light on it. Thanks in advance!