19

I'm running a Spark cluster in standalone mode.

I've submitted a Spark application in cluster mode using options:

--deploy-mode cluster –supervise 

So that the job is fault tolerant.

Now I need to keep the cluster running but stop the application from running.

Things I have tried:

  • Stopping the cluster and restarting it. But the application resumes execution when I do that.
  • Used Kill -9 of a daemon named DriverWrapper but the job resumes again after that.
  • I've also removed temporary files and directories and restarted the cluster but the job resumes again.

So the running application is really fault tolerant.

Question: Based on the above scenario can someone suggest how I can stop the job from running or what else I can try to stop the application from running but keep the cluster running.

Something just accrued to me, if I call sparkContext.stop() that should do it but that requires a bit of work in the code which is OK but can you suggest any other way without code change.

4b0
  • 21,981
  • 30
  • 95
  • 142
jakstack
  • 2,143
  • 3
  • 20
  • 37
  • [Here's a solution](https://stackoverflow.com/a/45947979/5513168) using Spark REST API for Spark Standalone clusters. – Lana Nova Aug 29 '17 at 20:43

3 Answers3

16

If you wish to kill an application that is failing repeatedly, you may do so through:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

You can find the driver ID through the standalone Master web UI at http://:8080.

From Spark Doc

yjshen
  • 6,583
  • 3
  • 31
  • 40
  • Do you happen to know if there's a way to stop it with out the driver id? The id changes each time. I want to have a jenkins job that runs spark-submit but to kill the previous process before submitting the new one. – nickn Mar 29 '16 at 18:24
  • 2
    What are master url? What do the ports look like? Running with mesos it doesn't recognize --master mesos://... like other commands. spark-submit seems to have kill. – Justin Thomas Aug 02 '16 at 22:16
  • I have the same problem. Unfortunately I'm not running the cluster and this cluster is run without giving the UI permission to kill. When I run this command I get "Driver app-XXX has already finished or does not exist" (while I see it on the browser UI), I've also tried using spark-submit with --kill option and had no luck. Is there anything else I can try? – user2662165 Dec 31 '16 at 18:41
  • @nickn Check out my answer, I was having the same issue and was able to work through it after a while - although in a non-supported way. – Nathan Loyer Jan 05 '17 at 16:03
  • 1
    @user2662165 Any way to kill it using spark-class, spark-submit, or the submissions api endpoint are not going to work unless you submit your app in cluster mode. I struggled to grasp that as well. If you need to kill a driver run in client mode (the default), you have to use OS commands to kill the process manually. – Nathan Loyer Jan 05 '17 at 16:04
  • The port to use with master url is 7077. – Behroz Sikander May 28 '18 at 14:21
6

Revisiting this because I wasn't able to use the existing answer without debugging a few things.

My goal was to programmatically kill a driver that runs persistently once a day, deploy any updates to the code, then restart it. So I won't know ahead of time what my driver ID is. It took me some time to figure out that you can only kill the drivers if you submitted your driver with the --deploy-mode cluster option. It also took me some time to realize that there was a difference between application ID and driver ID, and while you can easily correlate an application name with an application ID, I have yet to find a way to divine the driver ID through their api endpoints and correlate that to either an application name or the class you are running. So while run-class org.apache.spark.deploy.Client kill <master url> <driver ID> works, you need to make sure you are deploying your driver in cluster mode and are using the driver ID and not the application ID.

Additionally, there is a submission endpoint that spark provides by default at http://<spark master>:6066/v1/submissions and you can use http://<spark master>:6066/v1/submissions/kill/<driver ID> to kill your driver.

Since I wasn't able to find the driver ID that correlated to a specific job from any api endpoint, I wrote a python web scraper to get the info from the basic spark master web page at port 8080 then kill it using the endpoint at port 6066. I'd prefer to get this data in a supported way, but this is the best solution I could find.

#!/usr/bin/python

import sys, re, requests, json
from selenium import webdriver

classes_to_kill = sys.argv
spark_master = 'masterurl'

driver = webdriver.PhantomJS()
driver.get("http://" + spark_master + ":8080/")

for running_driver in driver.find_elements_by_xpath("//*/div/h4[contains(text(), 'Running Drivers')]"):
    for driver_id in running_driver.find_elements_by_xpath("..//table/tbody/tr/td[contains(text(), 'driver-')]"):
        for class_to_kill in classes_to_kill:
            right_class = driver_id.find_elements_by_xpath("../td[text()='" + class_to_kill + "']")
            if len(right_class) > 0:
                driver_to_kill = re.search('^driver-\S+', driver_id.text).group(0)
                print "Killing " + driver_to_kill
                result = requests.post("http://" + spark_master + ":6066/v1/submissions/kill/" + driver_to_kill)
                print json.dumps(json.loads(result.text), indent=4)

driver.quit()
Nathan Loyer
  • 344
  • 3
  • 12
  • These days I am using DC/OS & mesos to run the spark jobs, and mesos provides an endpoint to get all of the running mesos frameworks. From there you can easily grab the spark job name and the driver ID. The mesos endpoint is `/state-summary`, which returns json like `{"frameworks":[{"name":"...","id":"..."}]}`. The id has more than just the driver ID in it, but it has "driver-" in there, which everything after that is your driver ID. – Nathan Loyer Jul 31 '18 at 16:24
  • The post method works in case of mesos based deployments, thanks. – Ja8zyjits Oct 25 '21 at 09:44
3

https://community.cloudera.com/t5/Support-Questions/What-is-the-correct-way-to-start-stop-spark-streaming-jobs/td-p/30183

according this link use to stop if your master use yarn

yarn application -list

yarn application -kill application_id
wyx
  • 3,334
  • 6
  • 24
  • 44