I am trying to run SilviaClusteringExample in SANSA-Stack/SANSA-Examples(https://github.com/SANSA-Stack/SANSA-Examples)
I have setup the Spark cluster using GCP DataProc with one Master Node and 3 worker nodes. As per the instructions given, ran spark-submit by specifying Hadoop Filesystem file paths for --input
& --output
paths.
Ran the following command,
spark-submit --class net.sansa_stack.examples.spark.ml.clustering.SilviaClusteringExample --master spark://<masternode_ip>:7077 /home/<user_name>/sansa/SANSA-Examples-develop/sansa-examples-spark/target/sansa-examples-spark_2.11-0.6.1-SNAPSHOT-jar-with-dependencies.jar --input /user/<user_name> --output /user/<user_name>/out.txt
The above command returns the below error,
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.jena.riot.system.RiotLib
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:135)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:118)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance$lzycompute(NTripleReader.scala:207)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance(NTripleReader.scala:207)
at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.get(NTripleReader.scala:209)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:148)
at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
Files under the Hadoop Filesystem,
hadoop fs -ls -h
Found 2 items
-rw-r--r-- 2 <user_name> hadoop 70.2 K 2019-07-22 06:58 SilviaClustering_HairStylist_TaxiDriver.txt
-rwxr--r-- 2 <user_name> hadoop 0 2019-07-22 07:09 out.txt
Please help in fixing the above issue. Thanks.