Apache Livy cURL not working for spark-submit command

Question

I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch.

Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar.
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This is the error statement, showing in livy batch log.

My spark-submit command is working perfectly fine for local .jar file.

spark-submit --class "SimpleApp" --master local target/scala-2.11/simple-project_2.11-1.0.jar

But same for livy (in cURL) it is throwing error.

"requirement failed: Local path /target/scala-2.11/simple-project_2.11-1.0.jar cannot be added to user sessions."

So, I shift .jar file in hdfs. My new code for livy is -

curl -X POST --data '{
    "file": "/jar/project.jar",
    "className": "SimpleApp",
    "args": ["ddd"]
}'  
-H 
"Content-Type: application/json" 
http://server:8998/batches

This is throwing error which is mention above.

Please let me know, where am I wrong?

Thanks in advance!

score 0 · Answer 1 · answered Jun 21 '18 at 13:14

0

hdfs://localhost:9001/jar/project.jar.

It's expecting your jar file located on hdfs.

If it's local, maybe you should try to specify protocol in a path, or just upload that into hdfs:

 "file": "file:///absolute_path/jar/project.jar",

answered Jun 21 '18 at 13:14

vvg

6,325
19
36

Okay and what is the solution for ClassNotFoundException? – Divya Arya Jun 22 '18 at 04:21
Jar that contains that class can’t be found, as soon as you provide correct path to class issue should be resolved. – vvg Jun 22 '18 at 07:56
In jar file class is in spark/wordcount folder, I tried spark.wordcount.SimpleApp as class name but still throwing ClassNotFoundException – Divya Arya Jun 22 '18 at 11:17
did you solve issue with path to jar file? is error message the same? – vvg Jun 22 '18 at 12:32
I have uploaded jar file in hdfs, but error still same. – Divya Arya Jun 23 '18 at 05:04

score 0 · Answer 2 · answered Jun 26 '18 at 10:01

You have to make a fat jar file with your codebase + necessary jar - sbt assembly or use maven plugin, upload this jar file to HDFS and run spark-submit with this jar file which is placed on HDFS or you can use cURL as well.

Steps with Scala/Java:

Make fat jar with SBT/Maven or whatever.
Upload fat jar to HDFS
Use cURL for submitting jobs:

curl -X POST --data '{ //your data should be here}' -H "Content-Type: plication/json" your_ip:8998/batches

If you don't want to make a fat jar file and upload it to HDFS, you can consider python scripts, it could be submitted like a plain text without any jar file.

The example with plain python code:

curl your_ip:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"asdf\")"}'

In data body, you have to send valid Python code. It's a way in which tools like Jupyter Notebook/Torch works.

Also, I made one more example with Livy and Python. For checking results:

curl your_ip:8998/sessions/0/statements/1

As I mentioned above, for Scala/Java fat jar and uploading to HDFS are required.

I have created fat jar, as per your instruction and uploaded to HDFS but problem statement still same, jar file still work with local path i.e. "spark-submit --class "SimpleApp" --master local myProject/target/scala-2.11/SimpleProject-assembly-1.0.jar" but doesn't work with HDFS path i.e. "spark-submit --class "SimpleApp" --master local hdfs://localhost:9001/jar/SimpleProject-assembly-1.0.jar" — Divya Arya, Jun 29 '18 at 11:36
@Divine You specified `local` for the path to HDFS - it's wrong. — , Jun 29 '18 at 15:09

score 0 · Answer 3 · edited Aug 13 '18 at 21:07

0

To use local files for livy batch jobs you need to add the local folder to the livy.file.local-dir-whitelist property in livy.conf.

Description from livy.conf.template:

List of local directories from where files are allowed to be added to user sessions. By default it's empty, meaning users can only reference remote URIs when starting their sessions.

edited Aug 13 '18 at 21:07

Ralf

16,086
4
44
68

answered Aug 13 '18 at 20:46

Kanan Totawar

11
3

Apache Livy cURL not working for spark-submit command

3 Answers3

Linked