Why does SparkLauncher return immediately and spawn no job?

Question

I am using SparkLauncher in Spark v1.6.0. My problem is that when I use this class to launch my Spark jobs, it returns immediately and no job is submitted. My code is as follows.

new SparkLauncher()
 .setAppName("test word count")
 .setAppResource("file://c:/temp/my.jar")
 .setMainClass("my.spark.app.Main")
 .setMaster("spark://master:7077")
 .startApplication(new SparkAppHandler.Listener() {
   @Override public void stateChanged(SparkAppHandle h) { }
   @Override public void infoChanged(SparkAppHandle h) { } 
  });

When I debug into the code, I notice, to my surprise, that all this clazz really does is calls a script spark-submit.cmd using ProcessBuilder.

[C:/tmp/spark-1.6.0-bin-hadoop2.6/bin/spark-submit.cmd, --master, spark://master:7077, --name, "test word count", --class, my.spark.appMain, C:/temp/my.jar]

However, if I run this command (the one that is run by ProcessBuilder) directly on the console, a Spark job is submitted. Any ideas on what's going on?

There's another method SparkLauncher.launch() that is available, but the javadocs say to avoid this method.

Any idea what's going on?

Have you tried putting some code into the two listener methods to report state and info changes as your app is submitted? — Chris Gerken, Mar 20 '16 at 11:31
So, what happened? Did you get it to work using SparkLauncher? I know SparkLauncher internally calls ./spark-submit with a ProcessBuilder wrapper on it but I am hoping it gives better life cycle management through the various listeners. — Kumar Vaibhav, May 05 '17 at 03:50

score 5 · Answer 1 · answered Apr 08 '16 at 03:43

If it works in the console but not from your program, you may need to tell the SparkLauncher where your Spark home is by:

.setSparkHome("C:/tmp/spark-1.6.0-bin-hadoop2.6")

But there could be other things going wrong. You may want to capture additional debugging information by using:

.addSparkArg("--verbose")

and

Map<String, String> env = Maps.newHashMap();
env.put("SPARK_PRINT_LAUNCH_COMMAND", "1");

Pass the env object to the SparkLauncher constructor:

new SparkLauncher(env)

score 3 · Answer 2 · answered Dec 08 '16 at 09:20

How do you place the new SparkLauncher() statement in the program?

If the main program/unit test immediately finishes after invoking .startApplication(), then the child-process created by it is terminated as well.

You can check the state of the job with the handle created

SparkAppHandle handle = new SparkLauncher()
    .setAppName("test word count")
    .setAppResource("file://c:/temp/my.jar")
    .setMainClass("my.spark.app.Main")
    .setMaster("spark://master:7077")
    .startApplication();

handle.getState();  // immediately returns UNKNOWN

Thread.sleep(1000); // wait a little bit...

handle.getState();  // the state may have changed to CONNECTED or others

I think that it is because the application takes a certain time to connect to the master, if the program ends before the connection is established, then no job is submitted.

how is this helping what the OP asked? – Kumar Vaibhav May 05 '17 at 03:52 — Kumar Vaibhav, May 05 '17 at 03:52

score 3 · Answer 3 · answered Mar 20 '17 at 18:38

You need to wait for the launcher to get connected to driver nd get you app id and status. For that you can do while loop or something similar. eg.

   while(!handle.getState().isFinal()) { 
   logger.info("Current state: "+ handle.getState()) 
   logger.info("App Id "+ handle.getAppId());
   Thread.sleep(1000L);
   // other stuffs you want to do 
   //
   }

Why does SparkLauncher return immediately and spawn no job?

3 Answers3