0

This question comes from my other question. There in comments, I observed a behaviour which I want to ask about. Does sparkcontext jobs have to run in main method only? Like below code doesn't work, the spark job is created but the executor keeps runnning and never finishes.

import org.apache.spark.SparkContext

object App38 {
  val sc = new SparkContext("local[1]", "SimpleProg")

  val nums = sc.parallelize(List(1, 2, 3, 4))

  println(nums.reduce((a, b) => a - b))

  def main(args: Array[String]): Unit = {
//    println(nums.reduce((a, b) => a - b))
  }
}

while if I put only the reduce call(below code) in main method it runs fine

import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession

object App38 {

  val sc = new SparkContext("local[1]", "SimpleProg")

  val nums = sc.parallelize(List(1, 2, 3, 4))

//  println(nums.reduce((a, b) => a - b))

  def main(args: Array[String]): Unit = {
    println(nums.reduce((a, b) => a - b))
  }
}

What is this behaviour? I'm new to spark, so any help is appreciated.

Dhruv
  • 117
  • 11

1 Answers1

-1

This has nothing to do with Spark, it is the Scala feature.

If you extend App then whatever you write in the body of the object is executed as part of main method. Refer https://www.scala-lang.org/api/2.13.3/scala/App.html

However you commented out the extends App so main method has to be provided which will be the entry point of your application. Hence whatever you write in your main method will get executed.

Hope this clarifies your question.

Amit
  • 1,111
  • 1
  • 8
  • 14
  • 1
    Your answer explains the difference between an object extending `App` and not extending `App` but doesn't explain the difference between execution inside `main` (either extending `App` or manually) and execution outside `main`. Right? – Dmytro Mitin Apr 18 '23 at 20:28
  • I am trying to explain why his first snippet does not work whereas second one works. Moreover his snippet shows `extends App` as commented out hence included that in answer. He might have done it unknowingly and thought its a spark issue, whereas it is just a scala thing that he may not be aware of. – Amit Apr 18 '23 at 20:36
  • if you look at the previous question linked, you'll see that the question is why there is blocking in one case and there isn't in the other. – Dmytro Mitin Apr 18 '23 at 20:38
  • 1
    Hey Amit, got your point. But in my first code even though `main` is empty *it's still there*. If I put a print statement instead of spark job outside of main it will get executed and print. In fact, in my first code the spark job is created and is running, the issue is that the executor is never able to finish the job, it just keeps running. I've modified the question and put this point. – Dhruv Apr 19 '23 at 06:46
  • *"This has nothing to do with Spark, it is the Scala feature."* Actually, this seems to be Spark-specific behavior – Dmytro Mitin Apr 24 '23 at 02:12