8

I am trying to avoid "while(true)" solution when i waiting until my spark apache job is done, but without success.

I have spark application which suppose to process some data and put a result to database, i do call it from my spring service and would like to wait until the job is done.

Example:

Launcher with method:

@Override
public void run(UUID docId, String query) throws Exception {
    launcher.addAppArgs(docId.toString(), query);

    SparkAppHandle sparkAppHandle = launcher.startApplication();

    sparkAppHandle.addListener(new SparkAppHandle.Listener() {
        @Override
        public void stateChanged(SparkAppHandle handle) {
            System.out.println(handle.getState() + " new  state");
        }

        @Override
        public void infoChanged(SparkAppHandle handle) {
            System.out.println(handle.getState() + " new  state");
        }
    });

    System.out.println(sparkAppHandle.getState().toString());
}

How to wait properly until state of handler is "Finished".

Kyr
  • 5,383
  • 2
  • 27
  • 22
Alex Aniska
  • 81
  • 1
  • 3

2 Answers2

6

I am also using SparkLauncher from a Spring application. Here is a summary of the approach that I took (by following examples in the JavaDoc).

The @Service used to launch the job also implements SparkHandle.Listener and passes a reference to itself via .startApplication e.g.

...
...
@Service
public class JobLauncher implements SparkAppHandle.Listener {
...
...
...
private SparkAppHandle launchJob(String mainClass, String[] args) throws Exception {

    String appResource = getAppResourceName();

    SparkAppHandle handle = new SparkLauncher()
        .setAppResource(appResource).addAppArgs(args)
        .setMainClass(mainClass)
        .setMaster(sparkMaster)
        .setDeployMode(sparkDeployMode)
        .setSparkHome(sparkHome)
        .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
        .startApplication(this);

    LOG.info("Launched [" + mainClass + "] from [" + appResource + "] State [" + handle.getState() + "]");

    return handle;
}

/**
* Callback method for changes to the Spark Job
*/
@Override
public void infoChanged(SparkAppHandle handle) {

    LOG.info("Spark App Id [" + handle.getAppId() + "] Info Changed.  State [" + handle.getState() + "]");

}

/**
* Callback method for changes to the Spark Job's state
*/
@Override
public void stateChanged(SparkAppHandle handle) {

    LOG.info("Spark App Id [" + handle.getAppId() + "] State Changed. State [" + handle.getState() + "]");

}

Using this approach, one can take action when the state changes to "FAILED", "FINISHED" or "KILLED".

I hope this information is helpful to you.

tegatai
  • 1,371
  • 1
  • 8
  • 2
  • I am also facing the same issue. I tried with the way OP using (creating new anonymous listener object) and the way you describe. In both cases, listener methods were not getting invoked. – Reddy Aug 22 '16 at 20:14
  • @Reddy : were you able to get this working? I'm also facing the same issue. Application exits immediately with getting appID and state. Works if I call Thread.sleep() explicitly. Thank you! – Tariq Nov 10 '16 at 20:43
  • How did you actually use this Job launcher class? – ascetic652 Aug 02 '18 at 14:05
3

I implemented using CountDownLatch, and it works as expected.

    ...
final CountDownLatch countDownLatch = new CountDownLatch(1);
SparkAppListener sparkAppListener = new SparkAppListener(countDownLatch);
SparkAppHandle appHandle = sparkLauncher.startApplication(sparkAppListener);
Thread sparkAppListenerThread = new Thread(sparkAppListener);
sparkAppListenerThread.start();
long timeout = 120;
countDownLatch.await(timeout, TimeUnit.SECONDS);    
    ...

private static class SparkAppListener implements SparkAppHandle.Listener, Runnable {
    private static final Log log = LogFactory.getLog(SparkAppListener.class);
    private final CountDownLatch countDownLatch;
    public SparkAppListener(CountDownLatch countDownLatch) {
        this.countDownLatch = countDownLatch;
    }
    @Override
    public void stateChanged(SparkAppHandle handle) {
        String sparkAppId = handle.getAppId();
        State appState = handle.getState();
        if (sparkAppId != null) {
            log.info("Spark job with app id: " + sparkAppId + ",\t State changed to: " + appState + " - "
                    + SPARK_STATE_MSG.get(appState));
        } else {
            log.info("Spark job's state changed to: " + appState + " - " + SPARK_STATE_MSG.get(appState));
        }
        if (appState != null && appState.isFinal()) {
            countDownLatch.countDown();
        }
    }
    @Override
    public void infoChanged(SparkAppHandle handle) {}
    @Override
    public void run() {}
}
Elkhan Dadashov
  • 2,417
  • 1
  • 13
  • 5