36

To check running applications in Apache spark, one can check them from the web interface on the URL:

http://<master>:8080

My question how we can check running applications from terminal, is there any command that returns applications status?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Mohanad Kaleia
  • 793
  • 1
  • 9
  • 22

4 Answers4

21

If it's for Spark Standalone or Apache Mesos cluster managers, @sb0709's answer is the way to follow.

For YARN, you should use yarn application command:

$ yarn application -help
usage: application
 -appStates <States>             Works with -list to filter applications
                                 based on input comma-separated list of
                                 application states. The valid application
                                 state can be one of the following:
                                 ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
                                 NING,FINISHED,FAILED,KILLED
 -appTypes <Types>               Works with -list to filter applications
                                 based on input comma-separated list of
                                 application types.
 -help                           Displays help for all commands.
 -kill <Application ID>          Kills the application.
 -list                           List applications. Supports optional use
                                 of -appTypes to filter applications based
                                 on application type, and -appStates to
                                 filter applications based on application
                                 state.
 -movetoqueue <Application ID>   Moves the application to a different
                                 queue.
 -queue <Queue Name>             Works with the movetoqueue command to
                                 specify which queue to move an
                                 application to.
 -status <Application ID>        Prints the status of the application.
Community
  • 1
  • 1
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
10

You can use spark-submit --status (as described in Mastering Apache Spark 2.0).

spark-submit --status [submission ID]

See the code of spark-submit for reference:

if (!master.startsWith("spark://") && !master.startsWith("mesos://")) {
  SparkSubmit.printErrorAndExit(
    "Requesting submission statuses is only supported in standalone or Mesos mode!")
}
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
n1tk
  • 2,406
  • 2
  • 21
  • 35
  • 4
    Note that (according the the link) the `--status` option only works for Spark standalone or Mesos with cluster deploy mode (not YARN) – DNA May 24 '16 at 18:18
  • when setting the `--master` argument the port of the listener is the same used when submitting the application? (say 7077) – Cavaz Jun 27 '19 at 12:11
3

I have found that it is possible to use REST API to submit, kill and get status of Spark jobs. The REST API is exposed on master on port 6066.

  1. To create the job, use the following curl command:

    curl -X POST http://spark-cluster-ip:6066/v1/submissions/create 
       --header "Content-Type:application/json;charset=UTF-8"
       --data 
        '{
            "action" : "CreateSubmissionRequest",
            "appArgs" : [ "blah" ],
            "appResource" : "path-to-jar-file",
            "clientSparkVersion" : "2.2.0",
            "environmentVariables" : { "SPARK_ENV_LOADED" : "1" },
            "mainClass" : "app-class",
            "sparkProperties" : { 
                "spark.jars" : "path-to-jar-file",
                "spark.driver.supervise" : "false",
                "spark.app.name" : "app-name",
                "spark.submit.deployMode" : "cluster",
                "spark.master" : "spark://spark-master-ip:6066" 
             }
         }'
    

    The response includes success or failure of the above operation and submissionId

    {
       'submissionId': 'driver-20170829014216-0001',
       'serverSparkVersion': '2.2.0',
       'success': True,
       'message': 'Driver successfully submitted as driver-20170829014216-0001',
       'action': 'CreateSubmissionResponse'
    }
    
  2. To delete the job, use the submissionId obtained above:

     curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-driver-20170829014216-0001
    

    The response again contains success/failure status:

    {
         'success': True,
         'message': 'Kill request for driver-20170829014216-0001 submitted',
         'action': 'KillSubmissionResponse',
         'serverSparkVersion': '2.2.0',
         'submissionId': 'driver-20170829014216-0001'
    }
    
  3. To get the status, use the following command:

    curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20170829014216-0001
    

    The response includes driver state -- current status of the app:

    {
      "action" : "SubmissionStatusResponse",
      "driverState" : "RUNNING",
      "serverSparkVersion" : "2.2.0",
      "submissionId" : "driver-20170829203736-0004",
      "success" : true,
      "workerHostPort" : "10.32.1.18:38317",
      "workerId" : "worker-20170829013941-10.32.1.18-38317"
    }
    

I found out about REST API here.

Lana Nova
  • 330
  • 3
  • 14
2

As in my case, my spark application runs on Amazon's AWS EMR remotely. So I make use of Lynx command line browser to access the spark application's status. While you have submitted your spark job from one terminal, open another terminal and fire the following command from new terminal.

   **lynx http://localhost:<4043 or other spark job port>**
Prashant_M
  • 2,868
  • 1
  • 31
  • 24