Why does SparkContext randomly close, and how do you restart it from Zeppelin?

Question

I am working in Zeppelin writing spark-sql queries and sometimes I suddenly start getting this error (after not changing code):

Cannot call methods on a stopped SparkContext.

Then the output says further down:

The currently active SparkContext was created at:

(No active SparkContext.)

This obviously doesn't make sense. Is this a bug in Zeppelin? Or am I doing something wrong? How can I restart the SparkContext?

Thank you

What spark master are you using?? is it yarn-client? – user1314742 Mar 06 '16 at 19:02 — user1314742, Mar 06 '16 at 19:02

user1314742 · Accepted Answer · 2016-03-07T21:19:34.070

I have faced this problem a couple of times.

If you are setting your master as yarn-client, it might be due to the stop / restart of Resource Manager, the interpreter process may still be running but the Spark Context (which is a Yarn application) does not exists any more.

You could check if Spark Context is still running by consulting your Resource manager web Interface and check if there is an application named Zeppelin running.

Sometimes restarting the interpreter process from within Zeppelin (interpreter tab --> spark --> restart) will solve the problem.

Other times you need to:

kill the Spark interpreter process from the command line
remove the Spark Interpreter PID file
and the next time you start a paragraph it will start new spark context

Thank you for your answer, I'll try this when I launch my next cluster — The Puma, Mar 08 '16 at 21:08

score 13 · Answer 2 · answered Aug 18 '16 at 17:54

I'm facing the same problem running multiple jobs in PySpark. Seems that in Spark 2.0.0, with SparkSession, when I call spark.stop() SparkSession calls the following trace:

# SparkSession 
self._sc.stop()
# SparkContext.stop()
self._jsc = None

Then, when I try to create a new job with new a SparkContext, SparkSession return the same SparkContext than before with self.jsc = None.

I solved setting SparkSession._instantiatedContext = None after spark.stop() forcing SparkSession to create a new SparkContext next time that I demand.

It's not the best option, but meanwhile it's solving my issue.

SparkSession was introduced in spark 2.0.0. – Franzi Mar 07 '17 at 11:17 — Franzi, Mar 07 '17 at 11:17

score 0 · Answer 3 · edited Jun 01 '17 at 17:34

I've noticed this issue more when running pyspark commands even with trivial variable declarations that a cell execution hangs in running state. As mentioned above by user1314742, just killing the relevant PID solves this issue for me.

e.g.:

ps -ef | grep zeppelin

This is where restarting the Spark interpreter and restarting zeppelin notebook does not solve the issue. I guess because it cannot control the hung PID itself.

score -1 · Answer 4 · edited Aug 11 '17 at 15:07

-1

Could you check your driver memory is enough or not ? I solved this issue by

enlarge driver memory

tune GC:

--conf spark.cleaner.periodicGC.interval=60 
--conf spark.cleaner.referenceTracking.blocking=false

edited Aug 11 '17 at 15:07

Derlin

9,572
2
32
53

answered Apr 04 '17 at 13:52

Guoqing Geng

11
2

Why does SparkContext randomly close, and how do you restart it from Zeppelin?

4 Answers4