7

I am wondering if it is possible to submit, monitor & kill spark applications from another service.

My requirements are as follows:

I wrote a service that

  1. parse user commands
  2. translate them into understandable arguments to an already prepared Spark-SQL application
  3. submit the application along with arguments to Spark Cluster using spark-submit from ProcessBuilder
  4. And plans to run generated applications' driver in cluster mode.

Other requirements needs:

  • Query about the applications status, for example, the percentage remains
  • Kill queries accrodingly

What I find in spark standalone documentation suggest kill application using:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

And should find the driver ID through the standalone Master web UI at http://<master url>:8080.

So, what am I supposed to do?

Related SO questions:
Spark application finished callback
Deploy Apache Spark application from another application in Java, best practice

Community
  • 1
  • 1
yjshen
  • 6,583
  • 3
  • 31
  • 40

7 Answers7

8

You could use shell script to do this.

The deploy script:

#!/bin/bash

spark-submit --class "xx.xx.xx" \       
        --deploy-mode cluster \
        --supervise \
        --executor-memory 6G hdfs:///spark-stat.jar > output 2>&1

cat output

and you will get output like this:

16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state...
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20160623083722-0026",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "driver-20160623083722-0026",
  "success" : true
}

And based on this, create your kill driver script

#!/bin/bash

driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'`

spark-submit --master spark://node-1:6066 --kill $driverid

Make sure given the script execute permission by using chmod +x

pinkdawn
  • 1,023
  • 11
  • 20
5

A "dirty" trick to kill spark apps is by kill the jps named SparkSubmit. The main problem is that the app will be "killed" but at spark master log it will appear as "finished"...

user@user:~$ jps  
20894 Jps
20704 SparkSubmit

user@user:~$ kill 20704

To be honest I don't like this solution but by now is the only way I know to kill an app.

Dharman
  • 30,962
  • 25
  • 85
  • 135
2

Here's what I do:

  1. To submit apps, use the (hidden) Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

    • This way you get a DriverID (under submissionId) which you can use to kill your Job later (you shouldn't Kill the Application, specially if you're using "supervise" on Standalone mode)
    • This API also lets you query the Driver Status
  2. Query status for apps using the (also hidden) UI Json API: http://[master-node]:[master-ui-port]/json/

    • This service exposes all information available on the master UI in JSON format.
  3. You can also use the "public" REST API to query Applications on Master or Executors on each worker, but this won't expose Drivers (at least not as of Spark 1.6)

Roman G.
  • 185
  • 8
0

you can fire yarn commnds from processbuilder to list the applications and then filter based on your application name that is available with you, extract the appId and then use Yarn commands poll the status/kill etc.

urug
  • 405
  • 7
  • 18
0
kill -9 $(jps | grep SparkSubmit | grep -Eo '[0-9]{1,7}')
Dharman
  • 30,962
  • 25
  • 85
  • 135
0

You can find driver id in [spark]/work/. The id is the directory name. Kill the job by spark-submit.

petertc
  • 3,607
  • 1
  • 31
  • 36
0

I also have same kind of problem where I need to map my application-id and driver-id and add them a csv for other application availability in standalone mode

I was able to get application id easily by using command sparkContext.applicationId

In order to get driver-id I thought of using shell command pwd When your program runs the driver logs are written in directory named with driver-id So I extracted the folder name to get driver-id

import scala.sys.process._
val pwdCmd= "pwd"
val getDriverId=pwdCmd.!!
val driverId = get_driver_id.split("/").last.mkString
Dharman
  • 30,962
  • 25
  • 85
  • 135
Rakesh
  • 21
  • 6