Job 65 cancelled because SparkContext was shut down

Question

I'm working on a shared Apache Zeppelin server. Almost every day, I try to run a command and get this error: Job 65 cancelled because SparkContext was shut down

I would love to learn more about what causes the SparkContext to shut down. My understanding is Zeppelin is a kube app that sends commands to a machine for the processing.

When a SparkContext shuts down, does that mean my bridge to the Spark cluster is down? And, if that's the case, how can I cause the bridge to the spark cluster to go down?

In this example, it happened when I was trying to upload data to S3.

This is the code

val myfiles = readParquet(
    startDate=ew LocalDate(2020, 4, 1),
    endDate=ew LocalDate(2020, 4, 7)
)

log_events.createOrReplaceTempView("log_events")

val mySQLDF = spark.sql(s"""
    select [6 columns]
    from myfiles 
    join [other table]
    on [join_condition]
"""
)

mySQLDF.write.option("maxRecordsPerFile", 1000000).parquet(path)
// mySQLDF has 3M rows and they're all strings or dates

This is the stacktrace error

org.apache.spark.SparkException: Job aborted.
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
  at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
  ... 48 elided
Caused by: org.apache.spark.SparkException: Job 44 cancelled because SparkContext was shut down
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:972)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:970)
  at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
  at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:970)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2286)
  at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84)
  at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2193)
  at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
  at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
  at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
  ... 70 more

Does this answer your question? [Job cancelled because SparkContext was shut down](https://stackoverflow.com/questions/53984506/job-cancelled-because-sparkcontext-was-shut-down) — reymon359, May 16 '20 at 13:48
Can you post full stack trace. & also what you are executing code if possible ? mostly because some memory issues — Srinivas, May 18 '20 at 17:14
Great, I posted the stack trace and the code. I pulled one example, but this happens with many different blocks of code — Cauder, May 19 '20 at 00:18

score 6 · Accepted Answer · answered May 19 '20 at 13:26

Your job is getting aborted at the write step. Job aborted. is the exception message for that, which is leading to the Spark Context being shutdown.

Look into optimising the write step, maxRecordsPerFile might be the culprit; maybe try a lower number.. you currently have 1M records in a file!

In general, Job ${job.jobId} cancelled because SparkContext was shut down just means that it's an exception due to which the DAG couldn't continue and needs to Error out. Its the Spark scheduler throwing this error when it faces an exception, it might be an exception that is unhandled in your code or a job failure due to any other reason. And as the DAG scheduler is stopped, the entire application will get stopped(this message is part of Cleanup).

To your questions -

When a SparkContext shuts down, does that mean my bridge to the Spark cluster is down?

SparkContext represents the connection to a Spark cluster, so if its dead it means you can't run run job on to it as you lost the link! On Zepplin, you can just restart the SparkContext (Menu -> Interpreter -> Spark Interpreter -> restart)

And, if that's the case, how can I cause the bridge to the spark cluster to go down?

With SparkException/Error in Jobs or manually using sc.stop()

On Zepplin Goto `Menu -> Interpreter -> Spark Interpreter -> Click restart` (or there is also a [hack mentioned here](https://stackoverflow.com/a/56754005/2142994) ) — Ani Menon, May 20 '20 at 07:33

Job 65 cancelled because SparkContext was shut down

1 Answers1