NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

Question

I've downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:

> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkPropert
ies(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArgume
nts.scala:97)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStr
eam
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 7 more

I've searched on Internet, it is said that HADOOP_HOME has not been set yet in spark-env.cmd. But I cannot find spark-env.cmd in the spark installation folder. I've traced the spark-shell command and it seems that there are no HADOOP_CONFIG in there. I've tried to add the HADOOP_HOME on environment variable but it still give the same exception.

Actually I don't really using the hadoop. I downloaded hadoop as a workaround as suggested in this question

I am using windows 8 and scala 2.10.

Any help will be appreciated. Thanks.

I had the same problem, then I installed with hadoop-2 one and it's working fine. — Chitrasen, Jul 01 '15 at 10:41

score 46 · Accepted Answer · edited May 23 '17 at 11:47

46

The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

One clean way to fix this issue is to:

Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation): @echo off set HADOOP_HOME=D:\Utils\hadoop-2.7.1 set PATH=%HADOOP_HOME%\bin;%PATH% set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.

edited May 23 '17 at 11:47

Community

1
1

answered Jul 10 '15 at 02:32

tiho

6,655
3
31
31

1

I followed steps described in https://spark.apache.org/docs/latest/hadoop-provided.html and chose 1st case with 'hadoop' binary is on your PATH, but I'm still getting the same issue. I've already added HADOOP_HOME in my environment variables, that's why I don't see any reason to set this variable in script. Could give me some idea why this problem still appear? – Ray Aug 11 '15 at 16:51
I understand my problem, but I have one question: is it possible to invoke in windows cmd file $(hadoop classpath) in the same way as in linux bash file? – Ray Aug 11 '15 at 17:57
@Ray wish I could help, but I'm no expert, all I can say is the above worked for me... I'm sure there's a way to automatically call "hadoop classpath" in spark-env.cmd to avoid the manual copy/paste, but I didn't look into it. Good luck! – tiho Aug 25 '15 at 18:39
It should work, class path are defined here: https://github.com/apache/spark/blob/284e29a870bbb62f59988a5d88cd12f1b0b6f9d3/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java#L213 – Thomas Decaux Dec 26 '15 at 22:11

score 21 · Answer 2 · answered Sep 04 '15 at 13:40

I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:

### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh file :

1- if you have the hadoop on your PATH

2- you want to show hadoop binary explicitly

3- you can also show hadoop configuration folder

http://spark.apache.org/docs/latest/hadoop-provided.html

score 21 · Answer 3 · answered Feb 13 '17 at 21:57

21

I too had the issue,

export SPARK_DIST_CLASSPATH=`hadoop classpath`

resolved the issue.

answered Feb 13 '17 at 21:57

Jimson James

2,937
6
43
78

2

completely helpful – Turbero Dec 05 '17 at 14:43

score 5 · Answer 4 · answered Aug 01 '15 at 17:04

I ran into the same error when trying to get familiar with spark. My understanding of the error message is that while spark doesn't need a hadoop cluster to run, it does need some of the hadoop classes. Since I was just playing around with spark and didn't care what version of hadoop libraries are used, I just downloaded a spark binary pre-built with a version of hadoop (2.6) and things started working fine.

score 5 · Answer 5 · answered Dec 31 '17 at 00:26

linux

ENV SPARK_DIST_CLASSPATH="$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"

windows

set SPARK_DIST_CLASSPATH=%HADOOP_HOME%\etc\hadoop\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\tools\lib\*

score 4 · Answer 6 · answered Jun 18 '15 at 07:35

4

Enter into SPARK_HOME -> conf

copy spark-env.sh.template file and rename it to spark-env.sh Inside this file you can set the parameters for spark.

answered Jun 18 '15 at 07:35

Kshitij Kulshrestha

2,032
1
20
27

Thanks for your response.. any idea which variable should i modify? I've rename it to spark-env.cmd too because I am using windows and add "set HADOOP_CONF_DIR=*path" but it gives the same exception.. – David Jun 18 '15 at 08:12
Firstly, try to Set HADOOP_HOME in spark-env.cmd and re run it, but if again the same problem persist, then export hadoop-common jar file on the spark classpath in the same configuration file. – Kshitij Kulshrestha Jun 18 '15 at 08:17
I've set the HADOOP_HOME and copy hadoop-common.jar to both lib and conf folder.. but still no luck :( – David Jun 18 '15 at 08:52
Don't just copy pass their class path too - eg. export SPARK_CLASSPATH=$SPARK_CLASSPATH:hadoop-common.jar – Kshitij Kulshrestha Jun 18 '15 at 09:01

score 3 · Answer 7 · edited Sep 03 '18 at 12:03

3

Run below from your package dir just before running spark-submit -

export SPARK_DIST_CLASSPATH=`hadoop classpath`

edited Sep 03 '18 at 12:03

Martin

2,411
11
28
30

answered Sep 03 '18 at 11:44

Spidey Praful

59
2

score 1 · Answer 8 · answered Jul 02 '15 at 05:55

I finally find a solution to remove the exception.

In spark-class2.cmd, add :

set HADOOP_CLASS1=%HADOOP_HOME%\share\hadoop\common\*
set HADOOP_CLASS2=%HADOOP_HOME%\share\hadoop\common\lib\*
set HADOOP_CLASS3=%HADOOP_HOME%\share\hadoop\mapreduce\*
set HADOOP_CLASS4=%HADOOP_HOME%\share\hadoop\mapreduce\lib\*
set HADOOP_CLASS5=%HADOOP_HOME%\share\hadoop\yarn\*
set HADOOP_CLASS6=%HADOOP_HOME%\share\hadoop\yarn\lib\*
set HADOOP_CLASS7=%HADOOP_HOME%\share\hadoop\hdfs\*
set HADOOP_CLASS8=%HADOOP_HOME%\share\hadoop\hdfs\lib\*

set CLASSPATH=%HADOOP_CLASS1%;%HADOOP_CLASS2%;%HADOOP_CLASS3%;%HADOOP_CLASS4%;%HADOOP_CLASS5%;%HADOOP_CLASS6%;%HADOOP_CLASS7%;%HADOOP_CLASS8%;%LAUNCH_CLASSPATH%

Then, change :

"%RUNNER%" -cp %CLASSPATH%;%LAUNCH_CLASSPATH% org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%

to :

"%RUNNER%" -Dhadoop.home.dir=*hadoop-installation-folder* -cp %CLASSPATH% %JAVA_OPTS% %*

It works fine with me, but I'm not sure this is the best solution.

score 1 · Answer 9 · edited Jul 16 '19 at 14:10

1

You should add these jars in you code:

common-cli-1.2.jar
hadoop-common-2.7.2.jar

edited Jul 16 '19 at 14:10

Alex Bravo

1,601
2
24
40

answered Jul 16 '19 at 13:33

Prabhakar Kumar Ojha

59
5

Emul · Answer 10 · 2015-11-11T15:55:21.543

0

Thank you so much. That worked great, but I had to add the spark jars to the classpath as well: ;c:\spark\lib* Also, the last line of the cmd file is missing the word "echo"; so it should say: echo %SPARK_CMD%

edited Nov 11 '15 at 15:55

answered Nov 10 '15 at 21:00

Emul

41
2

score 0 · Answer 11 · answered Feb 10 '16 at 06:18

I had the same issue ....Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/ FSDataInputStream at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa rkProperties$1.apply(SparkSubmitArguments.scala:111)... Then I realized that I had installed the spark version without hadoop. I installed the "with-hadoop" version the problem went away.

score 0 · Answer 12 · answered Jan 22 '17 at 09:20

for my case

running spark job locally differs from running it on cluster. on cluster you might have a different dependency/context to follow. so essentially in your pom.xml you might have dependencies declared as provided.

when running locally, you don't need these provided dependencies. just uncomment them and rebuild again.

score 0 · Answer 13 · answered Mar 10 '17 at 21:39

I encountered the same error. I wanted to install spark on my windows PC and therefore downloaded the without hadoop version of spark, but turns out you need the hadoop libraries! so download any hadoop spark version and set the environment variables.

score 0 · Answer 14 · answered Jan 30 '19 at 11:18

0

I got this error because the file was copied from Windows. Resolve it using

dos2unix file_name

answered Jan 30 '19 at 11:18

Shikkou

545
7
22

score 0 · Answer 15 · answered Jun 17 '21 at 13:30

0

I think you need spark-core dependency of maven. It worked fine for me.

answered Jun 17 '21 at 13:30

hardik

23
4

score 0 · Answer 16 · answered Apr 11 '22 at 14:11

0

I used:

export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2 export HADOOP_MAPRED_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce

It's work for me!

answered Apr 11 '22 at 14:11

user18772810

1

score 0 · Answer 17 · answered Jul 29 '22 at 11:55

0

I added hadoop-client-runtime-3.3.2.jar to my user library.

answered Jul 29 '22 at 11:55

MustardMan

43
1
7

NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

17 Answers17

Linked

Related