PySpark Errors Coming From Mismatch of Java Versions or Hadoop Versions?

Question

I installed PySpark 3.2.0 on Windows 10 with Hadoop 3.3.1 following this link. Because of proxy issues, I had to download winutils.exe for a different version of Hadoop (i.e. not the one corresponding to Hadoop 3.3.1).

When I open the command prompt and type pyspark, there doesn't seem to be any errors (only warnings):

Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/10/24 16:12:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Python version 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021 20:19:38)
Spark context available as 'sc' (master = local[*], app id = local-163510634343847).
SparkSession available as 'spark'.
>>> 21/10/24 16:12:40 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

However, when I try to run another program that uses pyspark, I get the following errors:

py4j.protocol.Py4JJavaError: An error occurred while calling o378.parquet
Caused by: java.lang.UnsatisfiedLinkError

The outputs of java -version and pyspark --version are:

java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b25)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b25, mixed mode)

and

Using Scala version 2.12.15, Java HotSpot(TM) Client VM, 1.8.0_201

Could this error be caused by the mismatch of the Java versions (1.8.0_201 vs 1.8.0_301)? Or is it most likely caused by having the wrong version of winutils.exe?

Possible duplicate of https://stackoverflow.com/questions/51187904/py4jjavaerror-an-error-occurred-while-calling-o26-parquet-reading-parquet-fil — Vikas Saxena, Oct 25 '21 at 05:23

score 0 · Answer 1 · answered Mar 02 '23 at 06:54

0

Open Anaconda prompt activate the environment which you are using this code the py4J error should be taken care off

conda install -c cyclus java-jdk

answered Mar 02 '23 at 06:54

Prajwal M

1

PySpark Errors Coming From Mismatch of Java Versions or Hadoop Versions?

1 Answers1