PySpark java.lang.ExceptionInInitializerError Caused by: java.lang.StringIndexOutOfBoundsException

Question

I followed these instructions and installed Apache Spark (PySpark) 2.3.1 on my machine which has the specifications:

Ubuntu 18.04
JDK 10
Python 3.6

When I create a SparkSession either indirectly by calling pyspark from the shell or by directly creating a session in my app with:

spark = pyspark.sql.SparkSession.builder.appName('test').getOrCreate()

I get the following exception:

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
....
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3107)
    at java.base/java.lang.String.substring(String.java:1873)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
    ... 22 more
Traceback (most recent call last):
  File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/python/pyspark/shell.py", line 38, in <module>
    SparkContext._ensure_initialized()
  File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 292, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/welshamy/tools/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py", line 93, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

If I'm using a Jupyter notebook, I also get this exception in the notebook:

Exception: Java gateway process exited before sending the driver its port number

All the solutions I found and followed [1,2,3] point toward environment variables definitions, but non of them worked for me.

Sam · Accepted Answer · 2018-07-03T11:39:07.220

2

PySpark 2.3.1 does not support JDK 10+. You need to install JDK 8 and set the JAVA_HOME environment variable to point to it.

If you are using Ubuntu (or *nix):

Install JDK 8
```
sudo apt-get install openjdk-8-jdk
```

Add the following line to your ~/.bashrc file:

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Under Windows, install JDK 8 and set JAVA_HOME.

edited Jul 03 '18 at 11:39

answered Jul 03 '18 at 11:24

Sam

11,799
9
49
68

score 0 · Answer 2 · answered Jul 19 '23 at 16:27

For macOS, I had to

Install Java 8, eg

brew install --cask adoptopenjdk/openjdk/adoptopenjdk8

add $JAVA_HOME to my ~/.zshrc

export JAVA_HOME='/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home'
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

remove any other jdk installs

ls /Library/Java/JavaVirtualMachines
sudo rm -rf NONJDK8.jdk

The 3rd bit is important! It did not work until I removed other-versioned jdks.

PySpark java.lang.ExceptionInInitializerError Caused by: java.lang.StringIndexOutOfBoundsException

2 Answers2