PYspark SparkContext Error "error occurred while calling None.org.apache.spark.api.java.JavaSparkContext."

Question

I know this question has been posted before, but I tried implementing the solutions, but none worked for me. I installed Spark for Jupyter Notebook using this tutorial:

https://medium.com/@GalarnykMichael/install-spark-on-mac-pyspark-
453f395f240b#.be80dcqat

Installed Latest Version of Apache Spark on the MAC

When I try to run the following code in Jupyter

wordcounts = sc.textFile('words.txt')

I get the following error:

name 'sc' is not defined

When I try adding the Code:

from pyspark import SparkContext, SparkConf
sc =SparkContext()

getting the following error:

An error occurred while calling 
None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hadoop.util.StringUtils
at
org.apache.hadoop.security.SecurityUtil.
getAuthenticationMethod(SecurityUtil.java:611)

Added the path in bash:

export SPARK_PATH=~/spark-2.2.1-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

#For python 3, You have to add the line below or you will get an error
# export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'

Please help me resolve this.

I added the following to the bash export SPARK_PATH=~/spark-2.2.1-bin-hadoop2.7 export PYSPARK_DRIVER_PYTHON="jupyter" export PYSPARK_DRIVER_PYTHON_OPTS="notebook" #For python 3, You have to add the line below or you will get an error # export PYSPARK_PYTHON=python3 alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]' — Susha Suresh, Jan 18 '18 at 22:24
which version of java do you have? Can you type: java -version ,and share the result? — george, Jan 18 '18 at 22:42
`PYSPARK_DRIVER_PYTHON="jupyter"` is a really crappy solution, and it should be avoided: https://stackoverflow.com/questions/47824131/configuring-spark-to-work-with-jupyter-notebook-and-anaconda/47870277#47870277 — desertnaut, Jan 18 '18 at 23:15

Vijay · Answer 1 · 2023-05-03T07:05:46.727

These steps solve my problem [ pyspark with jupyter notebook local setup for window os ]

Error for me in jupyter notebook

download and install java8 : https://www.oracle.com/java/technologies/downloads/#java8-windows
download spark-3.2.1-bin-hadoop2.7 : https://spark.apache.org/downloads.html

Unpack the .tgz file using 7zip or other tool
put it like C:\spark-3.2.1-bin-hadoop2.7

Note : we will use this path for environment variables

download winutils.exe : https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin

put it into C:\Hadoop\bin location

download and install python on window : https://www.python.org/downloads/
Add environment variables:

In the Settings window, under Related Settings, click Advanced system settings. On the Advanced tab, click Environment Variables. Click New to create a new environment variable. Click Edit to modify an existing environment variable.

5.1. User variable:

JAVA_HOME : C:\Program Files\Java\jdk-1.8
PATH : %JAVA_HOME%\bin
HADOOP_HOME : C:\Hadoop
PYSPARK_DRIVER_PYTHON : jupyter
PYSPARK_DRIVER_PYTHON_OPTS : notebook
PYSPARK_PYTHON : xxxxx\AppData\Local\Programs\Python\Python39\Scripts
SPARK_HOME : C:\spark-3.2.1-bin-hadoop2.7
SPARK_LOCAL_IP : localhost

5.2. system variables:

C:\Program Files\Java\jdk-20\bin
C:\spark-3.2.1-bin-hadoop2.7\bin
C:\Hadoop\bin

Testing:

open cmd and run >> java -version

C:\Users\xxxxxxx>java -version

it should return like

java version "1.8.0_371"
Java(TM) SE Runtime Environment (build 1.8.0_371-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.371-b11, mixed mode)

In cmd run C:\Users\xxxxxxx>pyspark

this command redirect you at http://localhost:8890/tree

create new notebook and write below code and run

import findspark findspark.init()

import pyspark # only run after findspark.init() from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()

df = spark.sql('''select 'spark' as India ''') df.show()

After following above steps

------------- Note ------------------

if all set and you still showing error "Using Spark's default log4j ERROR SparkContext ....... " then try below steps:

try to close cmd window and reopen it and try to execute >>pyspark command in cmd again
try to restart your system and reopen cmd and try to re run >>pyspark command again
check java version, some time java latest version raise error with pyspark try with jdk-1.8 and spark-3.2.1-bin-hadoop2.7, seems like jdk-1.8 and spark-3.2.1-bin-hadoop2.7 is working for me.

PYspark SparkContext Error "error occurred while calling None.org.apache.spark.api.java.JavaSparkContext."

1 Answers1

Linked