5

I have a SparkContext sc with a highly customised SparkConf(). How do I use that SparkContext to create a SparkSession? I found this post: https://stackoverflow.com/a/53633430/201657 that shows how to do it using Scala:

val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

but when I try and apply the same technique using PySpark:

from pyspark.sql import SparkSession
spark = SparkSession.builder.config(sc.getConf()).enableHiveSupport().getOrCreate()

It fails with error

AttributeError: 'SparkConf' object has no attribute '_get_object_id'

enter image description here

As I say I want to use the same SparkConf in my SparkSession as used in the SparkContext. How do I do it?


UPDATE

I've done a bit of fiddling about:

from pyspark.sql import SparkSession
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sc.getConf().getAll() == spark.sparkContext.getConf().getAll()

returns

True

so the SparkConf of both the SparkContext & the SparkSession are the same. My assumption from this is that SparkSession.builder.getOrCreate() will use an existing SparkContext if it exists. Am I correct?

jamiet
  • 10,501
  • 14
  • 80
  • 159
  • How about persisting the config and reading it like in the example here: https://stackoverflow.com/questions/48660725/how-to-use-custom-config-file-for-sparksession-without-using-spark-submit-to-su – michalrudko Jun 14 '19 at 12:08
  • hmmm...could do, feels like a long way around though. I'm hoping there's a way to use the SparkConf from `sc` – jamiet Jun 14 '19 at 12:12
  • 1
    Regarding your question - correct. In order to apply a different config you'd need to call spark.stop() and create a new SparkSession with a different config. – michalrudko Jun 14 '19 at 14:23
  • Got it. Thanks very much. – jamiet Jun 14 '19 at 14:24

1 Answers1

0

Your assumption:

My assumption from this is that SparkSession.builder.getOrCreate() will use an existing SparkContext if it exists. Am I correct?

is correct. However, you can also explicitly pass a SparkContext (with custom config set through SparkConf) to your SparkSession.

from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext


spark_conf = SparkConf()
#Define custom configuration properties
spark_conf.set("spark.executor.memory", "2g")
spark_conf.set("spark.executor.cores", "4")


context = SparkContext(conf=spark_conf)
#Define custom context properties
context.setCheckpointDir("checkpoints")

spark = (SparkSession(context)
         .builder
         .appName("DefaultSparkSession")
         .getOrCreate())

print(spark.sparkContext._conf.getAll())
Psychotechnopath
  • 2,471
  • 5
  • 26
  • 47