0

I'm working on Spark java application that should load a json file. Application successfully loads avro file but gets the below error while loading json file.

sparksession.read().json(hdfs_path_of_file) -- loading json file doesn't work
sparksession.read().format(avro).load(hdfs_path_of_file) -- loading avro file works
java.lang.NoClassDefFoundError org/apache/hadoop/fs/FSBuilder

My dependencies are-

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.0.0</version>
<exclusions>
    <exclusion>
        <artifactId>slf4j-log4j12</artifactId>
        <groupId>org.slf4j</groupId>
    </exclusion>
    <exclusion>
        <groupId>commons-beanutils</groupId>
        <artifactId>commons-beanutils</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.mortbay.jetty</groupId>
        <artifactId>jetty</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-compress</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-net</groupId>
        <artifactId>commons-net</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-cli</groupId>
        <artifactId>commons-cli</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.eclipse.jetty</groupId>
        <artifactId>jetty-webapp</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.zookeeper</groupId>
        <artifactId>zookeeper</artifactId>
    </exclusion>
    <exclusion>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
    </exclusion>
    <exclusion>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
    </exclusion>
    <exclusion>
        <groupId>com.fasterxml.woodstox</groupId>
        <artifactId>woodstox-core</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.eclipse.jetty</groupId>
        <artifactId>jetty-server</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.woodstox</groupId>
        <artifactId>stax2-api</artifactId>
    </exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.0.0</version>
<exclusions>
    <exclusion>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
    </exclusion>
    <exclusion>
        <groupId>io.netty</groupId>
        <artifactId>netty</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-cli</groupId>
        <artifactId>commons-cli</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs-client</artifactId>
    </exclusion>
    <exclusion>
        <groupId>io.netty</groupId>
        <artifactId>netty</artifactId>
    </exclusion>
    <exclusion>
        <groupId>xerces</groupId>
        <artifactId>xercesImpl</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.mortbay.jetty</groupId>
        <artifactId>jetty</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.mortbay.jetty</groupId>
        <artifactId>jetty-util</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
    </exclusion>
    <exclusion>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
    </exclusion>
    <exclusion>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
    </exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>3.0.0</version>
<exclusions>
    <exclusion>
        <groupId>com.squareup.okhttp</groupId>
        <artifactId>okhttp</artifactId>
    </exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.7.7.1.7.48-2</version>
<exclusions>
    <exclusion>
        <groupId>com.squareup.okhttp</groupId>
        <artifactId>okhttp</artifactId>
    </exclusion>
    <exclusion>
        <artifactId>slf4j-log4j12</artifactId>
        <groupId>org.slf4j</groupId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-compress</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-cli</groupId>
        <artifactId>commons-cli</artifactId>
    </exclusion>
    <exclusion>
        <groupId>commons-net</groupId>
        <artifactId>commons-net</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
    </exclusion>
    <exclusion>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
    </exclusion>
    <exclusion>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs-client</artifactId>
    </exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.7.7.1.7.48-2</version>
<exclusions>
    <exclusion>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
    </exclusion>
    <exclusion>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>commons-compiler</artifactId>
    </exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>2.4.7.7.1.7.48-2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.7.7.1.7.48-2</version>
<scope>provided</scope>
<exclusions>
    <exclusion>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
    </exclusion>
</exclusions>
</dependency>

Why can't FSBuilder be found even when hadoop-common is there? Also, FSBuilder is an interface and not a class, but I assume JVM classloader loads both classes as well as interfaces.

Prateek Gautam
  • 274
  • 2
  • 5
  • 23
  • 1
    If you're working on Spark, then add Spark dependencies. Those already include hdfs client as a transitive dependency.... You don't need the rest of your dependencies here to run the code you've shown – OneCricketeer Nov 08 '22 at 14:09
  • We have added spark dependencies as well like spark core, spark sql, spark avro and spark hive – Prateek Gautam Nov 08 '22 at 16:57
  • These dependencies were already there but i still don't understand why it can't find a particular class – Prateek Gautam Nov 08 '22 at 16:58
  • Worth noting that `NoClassDefFoundError` is not the same as `ClassNotFoundException`. https://stackoverflow.com/questions/28322833/classnotfoundexception-vs-noclassdeffounderror. TLDR The former is often seen when **the dependencies** of a class being loaded are not found in classpath, at runtime, not the class itself. – mazaneicha Nov 08 '22 at 18:22
  • Please show your complete POM as a [mcve] if you do have Spark dependencies. Also what Spark version do you have installed? – OneCricketeer Nov 08 '22 at 20:41
  • @OneCricketeer I've added spark dependencies as well in my answer. My pom is 800 lines long, I highly doubt if i can paste that entire thing over here – Prateek Gautam Nov 09 '22 at 05:10
  • Thanks, so I suspect your problem is that Spark 2.4.7 isn't using Hadoop 3 libraries, so you manually adding them is causing conflicts. As I stated before, Spark includes HDFS dependencies itself, so what exactly is the error you're getting when you remove the ones you've added? Also, you should add provided scope to the Spark dependencies. (and I highly doubt you should be excluding so much unless you want other class not found exceptions) – OneCricketeer Nov 09 '22 at 15:42

0 Answers0