To the best of my understanding till date, in spark a job is submitted whenever an action is called on a dataset/dataframe. the job may further be divided into stages and tasks, which I understand how to find out the number of stages and tasks. Given below is my small code
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.read.json("/Users/vipulrajan/Downloads/demoStuff/data/rows/*.json").select("user_id", "os", "datetime", "response_time_ms")
df.show()
df.groupBy("user_id").count().show
To the best of my understanding it should have submitted one job at line 4 when I read. one on the first show and one on the second show. The first two assumptions are correct, but for the second show it submits 5 jobs. I can't understand why. Below is the screenshot of my UI
as you can see job 0 for reading the json, job 1 for the first show and 5 jobs for the second show. Can anyone help me understand what is this job in the spark UI?