2

Previously my understanding was , an action will create a job in spark application. But let's see below scenario where I am just creating a dataframe using .range() method

df=spark.range(10)

Since my spark.default.parallelism is 10, resultant dataframe is of 10 partitions. Now I am just performing a .show() and .count() actions on dataframe

df.show()
df.count()

Now when I have checked spark history I can see 3 jobs for .show() and 1 job for .count()

enter image description here

Why 3 jobs are here for .show() method?

I have read some where .show() will eventually call .take() internally and it will iterate through partitions which decides the number of jobs . But I didn't understand that part? What exactly decides the number of jobs ?

akhil pathirippilly
  • 920
  • 1
  • 7
  • 25

0 Answers0