Do all jobs need to finish for spark application to finish?

Question

I used to think that spark application finishes when all jobs succeed. But, then I came across this parameter:

spark.driver.maxResultSize: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

What happens to the rest of the application when a job is aborted?

As per the answer here describing the parameter spark.driver.maxResultSize,

The goal here is to protect your application from driver loss, nothing more.

How does aborting the job prevent driver loss?

Or, more broadly "What happens to the rest of the application when a job is aborted?"

I see. I could have definitely explained the question better. My sincere apology for having wasted your time. I have updated the question. Kindly, give it a look now :) — figs_and_nuts, Jan 22 '22 at 16:44

score 2 · Accepted Answer · answered Jan 22 '22 at 21:54

2

If spark.driver.maxResultSize = 0 (ie. unlimited) and you try to load huge amount of data into the driver:

val result = spark.table("huge_table").collect()

it will hit OOM error and get killed by the scheduler, crashing entire application.

If maxResultSize is set to some sane value however, when the amount of data being downloaded to the driver exceeds this threshold, only the job is aborted instead. Driver survives and receives a SparkException and you have a chance to catch it and recover:

val result = try {
  spark.table("huge_table").collect()
} catch {
  case e: SparkException =>
    if (e.getMessage().contains("maxResultSize"))
      // Oops, that was too much
      spark.table("huge_table").take(1000)
    else
      throw e
  }
}

answered Jan 22 '22 at 21:54

Kombajn zbożowy

8,755
3
28
60

Answer does not match title of question – thebluephantom Jan 22 '22 at 22:34
1

I answered the questions asked later in the text though. Sure, it could be expanded with some additional info. Feel free to edit or provide separate answer. – Kombajn zbożowy Jan 23 '22 at 09:45
I think this was very helpful. The job abortion means application failure as we can test it by writing a few collects and messing up one of them. However, job abortion gives us the opportunity to handle the exception the way we see fit thus effectively saving our driver. This opportunity is not available in case of OOM errors. It resolved 90% of my question. Thank you very much for your help, kind friend :) – figs_and_nuts Jan 23 '22 at 10:32

score 0 · Answer 2 · answered Jan 24 '22 at 19:00

Looking at the title of the question, if there are no issues the Spark App terminates when the last Stage completes all Tasks.

If a serious run-time error occurs that cannot be catered for by try/catch- e.g. Driver OOM blow out or a Task / Node that fails 4x, then the Spark App will terminate and running Tasks are cancelled by the Driver issuing cancel requests. The Cluster or tables, files are in an inconsistent state. It's a while since I coded as I work as Big Data / AZURE Architect these days.

Do all jobs need to finish for spark application to finish?

2 Answers2