Submit & Kill Spark Application program programmatically from another application

Question

I am wondering if it is possible to submit, monitor & kill spark applications from another service.

My requirements are as follows:

I wrote a service that

parse user commands
translate them into understandable arguments to an already prepared Spark-SQL application
submit the application along with arguments to Spark Cluster using spark-submit from ProcessBuilder
And plans to run generated applications' driver in cluster mode.

Other requirements needs:

Query about the applications status, for example, the percentage remains
Kill queries accrodingly

What I find in spark standalone documentation suggest kill application using:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

And should find the driver ID through the standalone Master web UI at http://<master url>:8080.

So, what am I supposed to do?

Related SO questions:
Spark application finished callback
Deploy Apache Spark application from another application in Java, best practice

pinkdawn · Answer 1 · 2016-06-23T01:56:37.647

You could use shell script to do this.

The deploy script:

#!/bin/bash

spark-submit --class "xx.xx.xx" \       
        --deploy-mode cluster \
        --supervise \
        --executor-memory 6G hdfs:///spark-stat.jar > output 2>&1

cat output

and you will get output like this:

16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state...
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20160623083722-0026",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "driver-20160623083722-0026",
  "success" : true
}

And based on this, create your kill driver script

#!/bin/bash

driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'`

spark-submit --master spark://node-1:6066 --kill $driverid

Make sure given the script execute permission by using chmod +x

This helped me a ton! Thanks – sdot257 May 22 '17 at 15:40 — sdot257, May 22 '17 at 15:40

score 5 · Answer 2 · edited Apr 25 '20 at 16:30

5

A "dirty" trick to kill spark apps is by kill the jps named SparkSubmit. The main problem is that the app will be "killed" but at spark master log it will appear as "finished"...

user@user:~$ jps  
20894 Jps
20704 SparkSubmit

user@user:~$ kill 20704

To be honest I don't like this solution but by now is the only way I know to kill an app.

edited Apr 25 '20 at 16:30

Dharman

30,962
25
85
135

answered Jul 23 '15 at 11:12

MarcosStratian

51
1
6

score 2 · Answer 3 · answered Feb 17 '16 at 14:49

Here's what I do:

To submit apps, use the (hidden) Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
- This way you get a DriverID (under submissionId) which you can use to kill your Job later (you shouldn't Kill the Application, specially if you're using "supervise" on Standalone mode)
- This API also lets you query the Driver Status
Query status for apps using the (also hidden) UI Json API: http://[master-node]:[master-ui-port]/json/
- This service exposes all information available on the master UI in JSON format.
You can also use the "public" REST API to query Applications on Master or Executors on each worker, but this won't expose Drivers (at least not as of Spark 1.6)

score 0 · Answer 4 · answered May 01 '15 at 17:47

0

you can fire yarn commnds from processbuilder to list the applications and then filter based on your application name that is available with you, extract the appId and then use Yarn commands poll the status/kill etc.

answered May 01 '15 at 17:47

urug

405
7
18

1

what if i am using standalone cluster? are there equivelent methods? – yjshen May 02 '15 at 01:38

score 0 · Answer 5 · edited Apr 25 '20 at 16:29

0

kill -9 $(jps | grep SparkSubmit | grep -Eo '[0-9]{1,7}')

edited Apr 25 '20 at 16:29

Dharman

30,962
25
85
135

answered Dec 26 '17 at 19:11

Daniar Heri Kurniawan

1

score 0 · Answer 6 · answered Jan 05 '18 at 01:48

0

You can find driver id in [spark]/work/. The id is the directory name. Kill the job by spark-submit.

answered Jan 05 '18 at 01:48

petertc

3,607
1
31
36

score 0 · Answer 7 · edited Apr 25 '20 at 16:28

I also have same kind of problem where I need to map my application-id and driver-id and add them a csv for other application availability in standalone mode

I was able to get application id easily by using command sparkContext.applicationId

In order to get driver-id I thought of using shell command pwd When your program runs the driver logs are written in directory named with driver-id So I extracted the folder name to get driver-id

import scala.sys.process._
val pwdCmd= "pwd"
val getDriverId=pwdCmd.!!
val driverId = get_driver_id.split("/").last.mkString

Submit & Kill Spark Application program programmatically from another application

7 Answers7

Linked