Error while running first Pyspark program in Jupyter

Question

I am a beginner in Pyspark, trying to execute few lines of code in a Jupyter notebook. I have followed the instructions available(pretty old - https://changhsinlee.com/install-pyspark-windows-jupyter/) in the internet to configure Pyspark post installing Python-3.8.5, Java(jdk-16), spark-3.1.1-bin-hadoop2.7.

Below are the lines which got executed successfully post installation and throws exception after 'df.show()'.I have added all necessary environment variables. Please help me to resolve this.

pip install pyspark

pip install findspark

import findspark

findspark.init()

import pyspark

from pyspark.sql import SparkSession

spark=SparkSession.builder.getOrCreate()

df=spark.sql('''Hello''')

df.show() Exception

Added error in the comments section.

Note: I am a beginner in Python. Do not have java knowledge

Exception # This SparkContext may be an existing one. --> 228 sc = SparkContext.getOrCreate(sparkConf) 229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared 230 # by all sessions. — NikRED, Mar 21 '21 at 14:20
Check this once : https://stackoverflow.com/questions/44502872/how-can-i-get-the-current-sparksession-in-any-place-of-the-codes/44504213 — Emad, Mar 21 '21 at 16:20

score 0 · Answer 1 · answered Mar 24 '21 at 06:08

0

Had to change the Java version into Java 11. It works now.

answered Mar 24 '21 at 06:08

NikRED

1,175
2
21
39

Error while running first Pyspark program in Jupyter

1 Answers1