27

I know this is a trivial question, but I could not find the answer on the internet.

I am trying to run a Java class with the main function with program arguments (String[] args).

However, when I submit the job using spark-submit and pass program arguments as I would do with

java -cp <some jar>.jar <Some class name> <arg1> <arg2>

it does not read the args.

The command I tried running was

bin/spark-submit analytics-package.jar --class full.package.name.ClassName 1234 someargument someArgument

and this gives

Error: No main class set in JAR; please specify one with --class

and when I tried:

bin/spark-submit --class full.package.name.ClassName 1234 someargument someArgument analytics-package.jar 

I get

Warning: Local jar /mnt/disk1/spark/1 does not exist, skipping.
java.lang.ClassNotFoundException: com.relcy.analytics.query.QueryAnalytics
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:176)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:183)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:208)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:122)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How can I pass these arguments? They change frequently on each run of the job, and they need to be passed as arguments.

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Eric
  • 2,635
  • 6
  • 26
  • 66
  • 3
    You're supposed to pass the arguments after the jar. See the documentation on submitting Spark applications: http://spark.apache.org/docs/latest/submitting-applications.html – Ton Torres Mar 16 '16 at 00:25

4 Answers4

44

Arguments passed before the .jar file will be arguments to the JVM, where as arguments passed after the jar file will be passed on to the user's program.

bin/spark-submit --class classname -Xms256m -Xmx1g something.jar someargument

Here, s will equal someargument, whereas the -Xms -Xmx is passed into the JVM.

public static void main(String[] args) {

    String s = args[0];
}
Srini
  • 1,626
  • 2
  • 15
  • 25
Matt Clark
  • 27,671
  • 19
  • 68
  • 123
  • are you sure about that ? I am using spark 1.6.2 on yarn and I get all the arguments include **--class classname ...** I get everything – David H Dec 01 '16 at 09:24
10

I found the correct command from this tutorial.

The command should be of the form:

bin/spark-submit --class full.package.name.ClassName analytics-package.jar someargument someArgument
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
Eric
  • 2,635
  • 6
  • 26
  • 66
1
spark-submit --class SparkWordCount --master yarn --jars <jar1.jar>,<jar2.jar>
sparkwordcount-1.0.jar /user/user01/input/alice.txt /user/user01/output
Zoe
  • 27,060
  • 21
  • 118
  • 148
Sushruth
  • 9
  • 2
0

The first unrecognized argument is treated as the primaryResource (jar file in our case). Checkout SparkSubmitArguments.handleUnknown

All the arguments after the primaryResource as treated as arguments to the application. Checkout SparkSubmitArguments.handleExtraArgs

To better understand how the arguments are parsed, checkout SparkSubmitOptionParser.parse. The above 2 methods are called from this method

rahul
  • 1,423
  • 3
  • 18
  • 28