1

I'm submitting my jobs to a spark-cluster (with YARN) programmatically with a Java-app and the Spark Launcher (starting the job with startApplication(), not launch()). I like to have all the log-output, which is produced on stdout und stderr by the launcher when executing the Java app, in a file, which I can access with the java-app. I don't want to change the global spark-log-config, I want a dynamic solution, which I can control depending on changing variables from the java-app on every single execution.

Following the documentation this should be possible by using the CHILD_PROCESS_LOGGER_NAME option. So I defined a java.util.logging.logger like here and added this code to my job-launcher:

SparkLauncher.setConfig(SparkLauncher.CHILD_PROCESS_LOGGER_NAME, "MyLog");

But this doesn't work, logfile is empty. I also tried the other methods like setConf(...) or add addSparkArg(...), without success. What did I wrong? Or should I better use log4j, make a custom configuration, and give it in any way to the launcher? If yes, how to do this in my java-app?

MUmla
  • 445
  • 1
  • 8
  • 26

2 Answers2

1

Below is the code snippet I have been using to print sparkLauncher logs with slf4j-log4j:

private static final Logger LOGGER = LoggerFactory.getLogger(JobSubmitter.class);

SparkLauncher launcher = new SparkLauncher()............;//prepare launcher

launcher.redirectToLog(JobSubmitter.class.getName());
        SparkAppHandle handler = launcher.startApplication();
        while (handler.getState() == null || !handler.getState().isFinal()) {
            if (handler.getState() != null) {
                LOGGER.info("Job state is :{} " , handler.getState());
                if (handler.getAppId() != null) {
                    LOGGER.info("App id: {} :: state:{}" , handler.getAppId() , handler.getState());
                }
            }
//Pause job to reduce job check frequency
            Thread.sleep(jobStatusCheckInterval ==0?DEFAULT_JOB_STATUS_CHECK_INTERVAL:jobStatusCheckInterval);
        }

Add comment in case you have any queries.

Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
  • I also discovered the redirectToLog() method when googling, but it isn't available for me and was already confused. Now I know why, I should have mentioned that: I'm using Spark 1.6 where this method wasn't implemented! Then the propability to find a solution seems to be very low, I think :( Or is there still another way? – MUmla Aug 21 '17 at 20:50
  • 1
    Take a look at answer by TomaszGuzialek [Here](https://stackoverflow.com/questions/31754328/spark-launcher-waiting-for-job-completion-infinitely) he is using stream to fetch log from output/error stream. – Rahul Sharma Aug 22 '17 at 15:39
0

I tried with redirectOutput(java.io.File outFile) and able to get all sparkLauncher's logging to outFile.

khushbu kanojia
  • 250
  • 1
  • 3