107

I'm trying to install Spark on my Mac. I've used home-brew to install spark 2.4.0 and Scala. I've installed PySpark in my anaconda environment and am using PyCharm for development. I've exported to my bash profile:

export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH

However I'm unable to get it to work.

I suspect this is due to java version from reading the traceback. I would really appreciate some help fixed the issue. Please comment if there is any information I could provide that is helpful beyond the traceback.

I am getting the following error:

Traceback (most recent call last):
  File "<input>", line 4, in <module>
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/pyspark/rdd.py", line 816, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException: Unsupported class file major version 55
Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
shbfy
  • 2,075
  • 3
  • 16
  • 37
  • 2
    this fix worked for me even with "Unsupported class file major version 57" – SchwarzeHuhn Dec 14 '19 at 10:56
  • 2
    __FIX:__ To fix this issue I edited the bash_profile to ensure java 1.8 is used as the global default as follows: `touch ~/.bash_profile; open ~/.bash_profile` Adding `export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)` and saving within text edit. – shbfy May 15 '20 at 09:53
  • That fix works for any Java on Mac. Libexec has nothing to do with licensing or oracle – OneCricketeer Jul 16 '20 at 14:52
  • Dependency hell for Spark. I hate it. – 0x4a6f4672 Nov 25 '20 at 12:33
  • @James Hi I followed your solution but when I type `java -version` in Pychanr Terminal,it's still giving me `openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment (build 11.0.6+8-b765.1) ` – wawawa Feb 05 '21 at 12:13

11 Answers11

111

Edit Spark 3.0 supports Java 11, so you'll need to upgrade

Spark runs on Java 8/11, Scala 2.12, Python 2.7+/3.4+ and R 3.1+. Java 8 prior to version 8u92 support is deprecated as of Spark 3.0.0



Original answer

Until Spark supports Java 11, or higher (which would be hopefully be mentioned at the latest documentation when it is), you have to add in a flag to set your Java version to Java 8.

As of Spark 2.4.x

Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.4 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x)

On Mac/Unix, see asdf-java for installing different Javas

On a Mac, I am able to do this in my .bashrc,

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

On Windows, checkout Chocolately, but seriously just use WSL2 or Docker to run Spark.


You can also set this in spark-env.sh rather than set the variable for your whole profile.

And, of course, this all means you'll need to install Java 8 in addition to your existing Java 11

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • 4
    Thanks @cricket_007 when I try brew cask install java8 I get the following error Cask 'java8' is unavailable: No Cask with this name exists. – shbfy Dec 02 '18 at 18:38
  • 2
    I've tried the following which seems to work: brew tap caskroom/versions brew cask install java8 – shbfy Dec 02 '18 at 18:40
  • 2
    This appears to have fixed the issue, but not within PyCharm. Do I need to point to java within that also? Thanks! – shbfy Dec 02 '18 at 20:49
  • I haven't used Pycharm recently, but if you edited the bashrc, the quit Pycharm and reopen, it should use the other Java version. It's not clear how you're running the code, though – OneCricketeer Dec 02 '18 at 21:46
  • Thanks cricket. Doesn't appear to work / notice the change, even after a restart. I've edited the content roots to include PySpark and py4j. It works fine in the terminal now so clearly this solution works - can you set JAVA_HOME directly in PyCharm? I think it's using a different version as you mention in your post. Could it have anything to do with the fact that I'm running in an anaconda environment? Thanks again for your help :) – shbfy Dec 02 '18 at 22:45
  • I only know how to do it in Intellij, but there should be "runtime configurations" that allow to set environment variables – OneCricketeer Dec 03 '18 at 02:30
  • 1
    Updated instructions for installing Java 8 JDK on macOS: "brew tap AdoptOpenJDK/openjdk; brew cask install adoptopenjdk8" – Joris Jul 09 '19 at 15:53
  • 1
    @James `fatal: repository 'brew' does not exist` when i do `brew tap caskroom/versions brew cask install java8` – Gonzalo Garcia Sep 09 '19 at 15:11
  • @GonzaloGarcia can you confirm you are using the latest version of `brew`? – shbfy Sep 09 '19 at 15:26
  • 3
    @James thanks for answering back, I solved by updating some git credentials. anyways java8 is no longer available because Oracle set the license on register first. So that approach doesn't work anymore. In order to install java8 u need to see this answer. https://stackoverflow.com/questions/24342886/how-to-install-java-8-on-mac/55774255#55774255 – Gonzalo Garcia Sep 09 '19 at 15:30
  • @GonzaloGarcia thanks! I'll add it to the fix at the top. – shbfy Sep 09 '19 at 15:31
  • "Until Spark is compiled to support Java 11...". Your first explanation is confusing, it seems like some operations are not supported by java 11 – Simon30 Sep 11 '19 at 14:34
  • @Simon30 Some options *are not* supported because Spark is not built/compiled with Java 11 support. *Until it is*, then use Java 8. – OneCricketeer Sep 11 '19 at 14:50
  • I think you missed the "not" in your sentence which is what confused me: "Until Spark is compiled to support Java 11" is supposed to be "Until Spark is NOT compiled to support Java 11" ? – Simon30 Sep 11 '19 at 15:08
  • @Simon30 Maybe it's a difference in English? Another way to read it is "Until Spark supports Java 11..." ... It currently does not, and you must do something else until it does support it. – OneCricketeer Sep 11 '19 at 15:15
  • Ah yes I see what you meant I guess I confused myself because of my poor english I'm sorry and thank you for your answers ^^ – Simon30 Sep 12 '19 at 10:40
  • I used jEnv to configure both Java 8 and 11 – Mark J Miller May 19 '21 at 16:07
98

I ran into this issue when running Jupyter Notebook and Spark using Java 11. I installed and configured for Java 8 using the following steps.

Install Java 8:

$ sudo apt install openjdk-8-jdk

Since I had already installed Java 11, I then set my default Java to version 8 using:

$ sudo update-alternatives --config java

Select Java 8 and then confirm your changes:

$ java -version

Output should be similar to:

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

I'm now able to run Spark successfully in Jupyter Notebook. The steps above were based on the following guide: https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04

Andre Oporto
  • 1,236
  • 9
  • 8
  • 1
    if you are using sdkman, `sdk install java 8.0.212-zulu` installs java 8 and asks if you want to use the installed java 8 to be the default java – XoXo May 31 '19 at 13:47
  • Thank you! my case was exactly the same as yours. – Kenny Aires Apr 09 '20 at 03:07
  • Hi I found myself using java 11, so I guess I have to install java 8, but I'm using windows + Pycharm, is there an instruction that I can follow? Many thanks. – wawawa Feb 05 '21 at 10:35
  • 1
    @Cecilia I run Windows myself, but for Spark I have only run it in a Virtual Machine or on AWS. That's my preferred approach especially due to java requirements. I find it makes setup *and* teardown simpler in the long run vs. running on your local machine. – Andre Oporto Feb 06 '21 at 18:52
  • Hi @AndreOporto I've given up the setup for windows, I've started trying to run pyspark in AWS, just wondering if you're using Glue or something else? Any suggested instruction articles? Many thanks. – wawawa Feb 07 '21 at 12:07
22

I found that adding the spark location through findspark and java8 with os at the beginning of the script the easiest solution:

import findspark
import os
spark_location='/opt/spark-2.4.3/' # Set your own
java8_location= '/usr/lib/jvm/java-8-openjdk-amd64' # Set your own
os.environ['JAVA_HOME'] = java8_location
findspark.init(spark_home=spark_location) 
Ferran
  • 840
  • 9
  • 18
7

The problem hear is that PySpark requirs Java 8 for some functions. Spark 2.2.1 was having problems with Java 9 and beyond. The recommended solution was to install Java 8.

you can install java-8 specifically, and set it as your default java and try again.

to install java 8,

sudo apt install openjdk-8-jdk

to change the default java version, follow this. you can use command

 update-java-alternatives --list

for listing all java versions available.

set a default one by running the command:

sudo update-alternatives --config java

to select java version you want. provide the accurate number in the provided list. then cheak your java version java -version and it should be updated. Set the JAVA_HOME variable also.

to set JAVA_HOME, You must find the specific Java version and folder. Fallow this SO discussion for get a full idea of setting the java home variable. since we are going to use java 8, our folder path is /usr/lib/jvm/java-8-openjdk-amd64/ . just go to /usr/lib/jvm folder and creak what are the avilable folders. use ls -l to see folders and their softlinks, since these folders can be a shortcut for some java versions. then go to your home directory cd ~ and edit the bashrc file

cd ~
gedit .bashrc

then Add bellow lines to the file, save and exit.

## SETTING JAVA HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

after that, to make effect of what you did, type source ~/.bashrc and run in terminal

Rajitha Fernando
  • 1,655
  • 15
  • 14
4

On windows (Windows 10) you can solve the issue by installing jdk-8u201-windows-x64.exe and resetting the system environment variable to the correct version of the JAVA JDK:

JAVA_HOME -> C:\Program Files\Java\jdk1.8.0_201.

Don't forget to restart the terminal otherwise the resetting of the environment variable does not kick in.

  • please don't forget to restart the terminal! – rishi jain Oct 16 '19 at 13:45
  • Hi I've added C:\Program Files\Java\jdk1.8.0_221 in my system env variables, but when I check `java -version` in Pycharm, it still gives `openjdk version "11.0.6" 2020-01-14 OpenJDK Runtime Environment (build 11.0.6+8-b765.1)` – wawawa Feb 05 '21 at 12:23
3

Just wanted to add my two cents here as it will save several hours of time for people who are using PyCharm (especially the run configuration). After changing your .bashrc or .bash_profile to point to Java 8 by modifying JAVA_HOME and PATH env variables (like most people here have recommended), you'll notice that when you run your Spark using the run configuration of PyCharm, it will still not pick up the right Java. Looks like there is some issue with PyCharm (I'm using PyCharm Professional 2020.2 in Mac Catalina). Additionally, when you run it using the terminal of PyCharm, it works fine. That confirms something is wrong with PyCharm. In order for the run configuration of PyCharm to pick up new JAVA, I had to specifically add JAVA_HOME environment variable in the run configuration as shown below- enter image description here

and it worked!

Another option that also works is checking the Include system environment variables option in the Environment Variables window in the run configuration (see screenshot above) and restarting PyCharm

Heapify
  • 2,581
  • 17
  • 17
2

For Debian 10 'buster' users, Java 8 JRE is available in the nvidia-openjdk-8-jre package.

Install it with

sudo apt install nvidia-openjdk-8-jre

Then set JAVA_HOME when running pyspark, e.g.:

JAVA_HOME=/usr/lib/jvm/nvidia-java-8-openjdk-amd64/ pyspark
SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47
0

I have the same issue in windows, and I have added JAVA_HOME to the environmental variable path:

JAVA_HOME: C:\Program Files\Java\jdk-11.0.1

Chaymae Ahmed
  • 371
  • 1
  • 4
  • 14
  • 1
    Hi, I have done the same. Still I am getting the same error. Is there anything else you changed? C:\Program Files\Java\jdk-11.0.2 – Gautam Mar 16 '19 at 10:25
  • @Gautum As the other answers show, you need Java 8. The error explicitly says version 55 (which is Java 11) isn't supported – OneCricketeer Apr 30 '19 at 13:13
0

Hi actually to be sure that you are putting the right SPARK_HOME PATH you can use this python script to locate it : https://github.com/apache/spark/blob/master/python/pyspark/find_spark_home.py

python3 find_spark_home.py 

/usr/local/lib/python3.7/site-packages/pyspark

On my Mac, on the terminal :

vim ~/.bashrc

and add the path :

export JAVA_HOME=/Library/java/JavaVirtualMachines/adoptopenjdk-8.jdk/contents/Home/

export SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark

export PYSPARK_PYTHON=/usr/local/bin/python3

and then finally to apply the change

source ~/.bashrc
ak6o
  • 1
  • 2
0

This issue occures due to Java version you set on JAVA_HOME environment variable.

OLD JAVA path :/usr/lib/jvm/java-1.11.0-openjdk-amd64

Solution : Set JAVA_HOME to /usr/lib/jvm/java-8-openjdk-amd64

It will work!!!

Note my Error was:

File "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 55'

Tanaji Sutar
  • 109
  • 1
  • 8
0

On macOS: install Java8 on your laptop using the following commands:

brew tap AdoptOpenJDK/openjdk
brew cask install adoptopenjdk8
ijoseph
  • 6,505
  • 4
  • 26
  • 26