0

I've setup a 3-node cluster (1-master & 2-workers) of Hadoop with Yarn along with Spark.

My Pyspark scripts need org.elasticsearch.spark in order to write to Elasticsearch. I'm providing this with parameter --packages org.elasticsearch:elasticsearch-spark-30_2.12:8.4.1 while executing my Pyspark script , that is while executing using spark-submit .

Stuck with this error :

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException
        at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 13 more

What have I tried :

  • I have tried to add all the paths listed on this answer - https://stackoverflow.com/a/25393369/6490744 - doesn't work.

  • I had Hadoop-3.1.1, after checking https://github.com/apache/incubator-kyuubi/issues/2904 (they've mentioned that the issue is resolved in Hadoop 3.3.3) I have upgraded to 3.3.3. But the issue still persists.

  • I have also tried by manually downloading the jar to my spark/jars directory using wget -U "Any User Agent" https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-30_2.12/8.4.1/elasticsearch-spark-30_2.12-8.4.1.jar => after downloading, tried to do spark-submit without passing --packages (since I have the jar in path).

All of this has been giving me the same error

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Sowjanya R Bhat
  • 1,128
  • 10
  • 19

1 Answers1

1

After 2 hours of struggle, got the clue from - https://github.com/apache/incubator-kyuubi/issues/2904#issuecomment-1158643036 :

I had yarn.timeline-service.enabled set to true in my /etc/hadoop/yarn-site.xml - updated to false , now the error is gone.

Wonder how to setup the yarn-timeline-server now

Sowjanya R Bhat
  • 1,128
  • 10
  • 19
  • 1
    Thanks, I have been struggling with this issue for really long time. I went to the above provided link 1000 times, but never thought about changing the property. It works for me as well when i updated the configuration. Not sure what platform you are using, but in case of EMR, you can update the yarn-site.xml during bootstrap phase. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html – Akshay Nov 16 '22 at 18:03
  • @Akshay glad it helped you ! Thanks for the EMR link, I'm currently not using cloud but this will definitely help me in future – Sowjanya R Bhat Nov 16 '22 at 18:58