I am trying to process Hierarchical Data using Grapghx Pregel
and the code I have works fine on my local.
But when I am running on my Amazon EMR
cluster it is giving me an error:
java.lang.NoClassDefFoundError: Could not initialize class
What would be the reason of this happening? I know the class is there in the jar file as it run fine on my local as well there is no build error.
I have included GraphX dependency on pom file.
Here is a snippet of code where error is being thrown:
def calcTopLevelHierarcy (vertexDF: DataFrame, edgeDF: DataFrame): RDD[(Any, (Int, Any, String, Int, Int))] =
{
val verticesRDD = vertexDF.rdd
.map { x => (x.get(0), x.get(1), x.get(2)) }
.map { x => (MurmurHash3.stringHash(x._1.toString).toLong, (x._1.asInstanceOf[Any], x._2.asInstanceOf[Any], x._3.asInstanceOf[String])) }
//create the edge RD top down relationship
val EdgesRDD = edgeDF.rdd.map { x => (x.get(0), x.get(1)) }
.map { x => Edge(MurmurHash3.stringHash(x._1.toString).toLong, MurmurHash3.stringHash(x._2.toString).toLong, "topdown") }
// create the edge RD top down relationship
val graph = Graph(verticesRDD, EdgesRDD).cache()
//val pathSeperator = """/"""
//initialize id,level,root,path,iscyclic, isleaf
val initialMsg = (0L, 0, 0.asInstanceOf[Any], List("dummy"), 0, 1)
val initialGraph = graph.mapVertices((id, v) => (id, 0, v._2, List(v._3), 0, v._3, 1, v._1))
val hrchyRDD = initialGraph.pregel(initialMsg, Int.MaxValue, EdgeDirection.Out)(setMsg, sendMsg, mergeMsg)
//build the path from the list
val hrchyOutRDD = hrchyRDD.vertices.map { case (id, v) => (v._8, (v._2, v._3, pathSeperator + v._4.reverse.mkString(pathSeperator), v._5, v._7)) }
hrchyOutRDD
}
I was able to narrow down the line that is causing an error:
val hrchyRDD = initialGraph.pregel(initialMsg, Int.MaxValue, EdgeDirection.Out)(setMsg, sendMsg, mergeMsg)