3

I am trying to run SilviaClusteringExample in SANSA-Stack/SANSA-Examples(https://github.com/SANSA-Stack/SANSA-Examples)

I have setup the Spark cluster using GCP DataProc with one Master Node and 3 worker nodes. As per the instructions given, ran spark-submit by specifying Hadoop Filesystem file paths for --input & --output paths.

Ran the following command,

spark-submit --class net.sansa_stack.examples.spark.ml.clustering.SilviaClusteringExample --master spark://<masternode_ip>:7077 /home/<user_name>/sansa/SANSA-Examples-develop/sansa-examples-spark/target/sansa-examples-spark_2.11-0.6.1-SNAPSHOT-jar-with-dependencies.jar --input /user/<user_name>  --output /user/<user_name>/out.txt

The above command returns the below error,

Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.jena.riot.system.RiotLib
        at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:135)
        at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:118)
        at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance$lzycompute(NTripleReader.scala:207)
        at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance(NTripleReader.scala:207)
        at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.get(NTripleReader.scala:209)
        at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:148)
        at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:140)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)

Files under the Hadoop Filesystem,

hadoop fs -ls -h 
Found 2 items
-rw-r--r--   2 <user_name> hadoop     70.2 K 2019-07-22 06:58 SilviaClustering_HairStylist_TaxiDriver.txt
-rwxr--r--   2 <user_name> hadoop          0 2019-07-22 07:09 out.txt

Please help in fixing the above issue. Thanks.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
Mahek
  • 71
  • 3
  • is it version 0.6.0? I'll check it on my machine. In the meantime, could you please post this to the mailing list: https://groups.google.com/forum/#!forum/sansa-stack it's much easier to keep track of issues there. Thanks – UninformedUser Jul 23 '19 at 06:41

0 Answers0