28

I am importing SparkSession as follows in PySpark:

from pyspark.sql import SparkSession

Then I create SparkSession:

spark = SparkSession.builder.appName("test").getOrCreate()

and try to access SparkContext:

spark.SparkContext.broadcast(...)

However, I get an error that SparkContext does not exist. How can I access it in order to set broadcast variables?

Markus
  • 3,562
  • 12
  • 48
  • 85
  • Set pyspark as environment variable [help] (https://stackoverflow.com/questions/23256536/importing-pyspark-in-python-shell) – Morse Mar 12 '18 at 20:19
  • @Prateek: That's not the question I am asking how to access SparkContext using `spark`, which is the instance of SparkSession. – Markus Mar 12 '18 at 20:24
  • 1
    `from pyspark import SparkContext as sc` if it doesnt work you have not set pyspark in environement variable OR spark server is not running – Morse Mar 12 '18 at 20:33
  • @Prateek: No, it will not work, because it should be related to the created SparkSession. I solved this problem myself just by `spark.sparkContext.broadcast(...)` instead of `spark.SparkContext.broadcast(...)` – Markus Mar 12 '18 at 20:36
  • great! I thought you are referring to sc in general – Morse Mar 12 '18 at 20:39

2 Answers2

46

You almost got it right, it's lowercase s at the beginning:

>>> spark.sparkContext
<SparkContext master=local[*] appName=PySparkShell>
Roberto Congiu
  • 5,123
  • 1
  • 27
  • 37
5

Asumming you have a spark session

spark_session = SparkSession \
    .builder \
    .enableHiveSupport() \
    .getOrCreate()

Spark Context can be inferred using

spark_context = spark_session._sc

or

spark_context = spark_session.sparkContext
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156