7

How I can check sparkSession status in my py spark code ? Requirement is to check whether sparkSession is active or not. if sparksession is not active create another spark session and call some function

I am writing and running this code in Jupyter notebook.

spark = SparkSession.builder.master("yarn") \                                 
    .config("spark.dynamicAllocation.enabled", "true") \
                                    .config("spark.serializer", 
    "org.apache.spark.serializer.KryoSerializer")
    .config("spark.shuffle.spill.compress", "true")
    .config("spark.shuffle.service.enabled", "true")
    .config("spark.io.compression.codec", "snappy")
    .config("spark.kryoserializer.buffer.max", "250m") \
    .config("spark.driver.memory", memory) \
    .config("spark.driver.cores", cores) \
    .config("spark.executor.cores", cores) \
    .config("spark.executor.memory", memory) \
    .config("spark.executor.instances", cores) \
    .enableHiveSupport()\
    .getOrCreate()
  1. spark prints the sparkSession details

3.

if(spark):
    print("yes")

else:
    print("no")

prints "yes"

  1. spark.stop()

it stops spark application -- I checked in UI

but when I am running code in third step again

5.

if(spark):
    print("yes")

else:
    print("no")

Prints "yes as output

  1. but it do spark
error : AttributeError: 'NoneType' object has no attribute 'sc'
  1. But weird thing which I saw when I ran my next command
df = spark.read.csv(file_name) 

It created another application and start execution of the code.

I am trying to understand whether sparkSession was killed or not as

Observation: a. if(spark) is giving TRUE as it it is prinung the kines underneath this. b. when I just write "spark" -- gave me error c. spark.read.csv ---- did not give any error and started a new application but threw the error after a while -- "Cannot call methods on a stopped SparkContext."

requirement was to check if some how sparkSession is stopped or failed while my code/application is running .. it should automatically restart

I was thinking to write

def func1:
    create spark session  
    code to execute 


def func2:
    while spark is active :
       time.sleep(200)
    if !spark is active:
        func1()


func1()

func2()
Bartosz Konieczny
  • 1,985
  • 12
  • 27
rakesh
  • 81
  • 1
  • 4

1 Answers1

0

Is this functionality that you need to manage yourself? As the method name suggests, SparkSession.builder.getOrCreate() already behaves this way:

https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.SparkSession.builder.getOrCreate.html

Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.

If you don't need to manage this yourself then I would recommend just using this method and leaving out the excess code.

If you do need to, then I believe your issue stems from here:

if (spark):
    print("yes")

This is checking the "truthiness" of the variable spark. Here's a good post discussing this. In short: if (spark) is checking to see if spark is undefined (set to None). spark.stop() just stops the SparkSession, it does not un-assign the variable.

Try this instead:

if (spark.getActiveSession()):
    print('yes')
else:
    print('no')