22

I am working in Zeppelin writing spark-sql queries and sometimes I suddenly start getting this error (after not changing code):

Cannot call methods on a stopped SparkContext.

Then the output says further down:

The currently active SparkContext was created at:

(No active SparkContext.)

This obviously doesn't make sense. Is this a bug in Zeppelin? Or am I doing something wrong? How can I restart the SparkContext?

Thank you

The Puma
  • 1,352
  • 2
  • 14
  • 27

4 Answers4

23

I have faced this problem a couple of times.

If you are setting your master as yarn-client, it might be due to the stop / restart of Resource Manager, the interpreter process may still be running but the Spark Context (which is a Yarn application) does not exists any more.

You could check if Spark Context is still running by consulting your Resource manager web Interface and check if there is an application named Zeppelin running.

Sometimes restarting the interpreter process from within Zeppelin (interpreter tab --> spark --> restart) will solve the problem.

Other times you need to:

  • kill the Spark interpreter process from the command line
  • remove the Spark Interpreter PID file
  • and the next time you start a paragraph it will start new spark context
user1314742
  • 2,865
  • 3
  • 28
  • 34
13

I'm facing the same problem running multiple jobs in PySpark. Seems that in Spark 2.0.0, with SparkSession, when I call spark.stop() SparkSession calls the following trace:

# SparkSession 
self._sc.stop()
# SparkContext.stop()
self._jsc = None

Then, when I try to create a new job with new a SparkContext, SparkSession return the same SparkContext than before with self.jsc = None.

I solved setting SparkSession._instantiatedContext = None after spark.stop() forcing SparkSession to create a new SparkContext next time that I demand.

It's not the best option, but meanwhile it's solving my issue.

Franzi
  • 1,791
  • 23
  • 21
0

I've noticed this issue more when running pyspark commands even with trivial variable declarations that a cell execution hangs in running state. As mentioned above by user1314742, just killing the relevant PID solves this issue for me.

e.g.:

ps -ef | grep zeppelin

This is where restarting the Spark interpreter and restarting zeppelin notebook does not solve the issue. I guess because it cannot control the hung PID itself.

DIF
  • 2,470
  • 6
  • 35
  • 49
-1

Could you check your driver memory is enough or not ? I solved this issue by

  1. enlarge driver memory
  2. tune GC:

    --conf spark.cleaner.periodicGC.interval=60 
    --conf spark.cleaner.referenceTracking.blocking=false
    
Derlin
  • 9,572
  • 2
  • 32
  • 53