8

This might be a very simple question. But is there any simple way to measure the execution time of a spark job (submitted using spark-submit)?

It would help us in profiling the spark jobs based on the size of input data.

EDIT : I use http://[driver]:4040 to monitor my jobs, but this Web UI shuts down the moment my job finishes.

pranav3688
  • 694
  • 1
  • 11
  • 20

3 Answers3

10

Every SparkContext launches its own instance of Web UI which is available at

http://[master]:4040
by default (the port can be changed using spark.ui.port ).

It offers pages (tabs) with the following information:

Jobs, Stages, Storage (with RDD size and memory use) Environment, Executors, SQL

This information is available only until the application is running by default.

Tip : You can use the web UI after the application is finished by enabling spark.eventLog.enabled.

Sample web ui where you can see the time as 3.2hours: enter image description here

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • thanks, I was going to ask about keeping the information after the job ends, but you have answered it anyway. The `spark.eventLog.enabled` is a configuration parameter specified on the command line while submitting the spark job, correct? – pranav3688 Apr 30 '16 at 23:28
  • Yes you are right. For example : ./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar – Ram Ghadiyaram May 01 '16 at 03:35
1

SPARK itself provides much granular information about each stage of your Spark Job. Go to the Web interface of Spark on http://your-driver-node:4040, you can use also history server.

If you just need execution time, then go to "http://your-driver-node:8080", and you can see execution time for a job submitted to a spark.

mpals
  • 251
  • 1
  • 6
  • You can check following url: - http://spark.apache.org/docs/latest/monitoring.html – mpals Apr 30 '16 at 07:23
  • I use the `http://your-driver-node:4040` to monitor my jobs always, but it doesn't give me end to end execution time, does it? if yes, then where... I will check the second link though! thanks! – pranav3688 Apr 30 '16 at 23:24
  • I can't seem to access 8080 and I don't see it documented...would you have more info about this 8080 page? – flow2k Jul 16 '19 at 20:17
0

If you want you can write a piece of code to get the net execution time.

Example:

val t1 = System.nanoTime //your first line of the code

val duration = (System.nanoTime - t1) / 1e9d //your last line of the code
venus
  • 1,188
  • 9
  • 18