6

Downloaded apache 3.2.0 the latest one as well as the hadoop file java Java SE Development Kit 17.0.1 is installed too

i am not even able to initialize

input :

import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()

Output#

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
Reza
  • 906
  • 2
  • 15
  • 29
Rahul Devnath
  • 61
  • 1
  • 3

2 Answers2

16

As you can read at https://spark.apache.org/docs/3.2.0/:

Spark 3.2.0 only supports Java version 8-11. I had the same issue on Linux and switching to Java 11 instead of 17 helped in my case.

BTW Spark 3.3.0 supports Java 17.

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124
Tomasz
  • 328
  • 2
  • 10
2

I faced the same issue today, but fixed it by changing JDK from 17 to 8 (only for spark start) as below.

  • spark-3.2.1
  • hadoop3.2
  • python 3.10
 File "D:\sw.1\spark-3.2.1-bin-hadoop3.2\python\lib\py4j-0.10.9.3-src.zip\py4j\protocol.py", line 326, in get_return_value
   raise Py4JJavaError(py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.: > java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$

Env variable was having %JAVA_HOME% to jdk17

Quick fix (incase you want to keep env. variable same but use jdk8 for spark only):

(1) create a batch file (start-pyspark.bat) in d:
(2) add below lines:

set JAVA_HOME=D:\sw.1\jdk1.8.0_332
set PATH=%PATH%;%JAVA_HOME%\bin;%SPARK_HOME%\bin;%HADOOP_HOME%\bin;
pyspark 

(3) on cmd, type <start-pyspark.bat> and enter.

d:\>start-pyspark.bat

d:\>set JAVA_HOME=D:\sw.1\jdk1.8.0_332

d:\>set PATH=D:\sw.1\py.3.10\Scripts\;D:\sw.1\py.3.10\;C:\Program Files\Zulu\zulu-17\bin;C:\Program Files\Zulu\zulu-17-jre\bin;C:\windows\system32;....;D:\sw.1\jdk1.8.0_332\bin;D:\sw.1\spark-3.2.1-bin-hadoop3.2\bin;D:\sw.1\hadoop\bin;

d:\>pyspark
Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/05/27 18:29:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.2.1

(4) If you close this spark prompt and cmd and restart, it will be in clean state as having JDK-17 set as JAVA_HOME from env.

ramindroid
  • 105
  • 1
  • 8