2

I am trying to write a kafka consumer in java using Apache spark. The code is not executing due to some Log4jController error. Don't know what I am missing.

pom.xml file is as follows:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-api</artifactId>
    <version>1.7.25</version>
</dependency>
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming_2.11</artifactId>
  <version>2.3.0</version>
  <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
    <version>2.3.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
  <groupId>org.apache.kafka</groupId>
  <artifactId>kafka_2.11</artifactId>
  <version>1.0.0</version>
    <exclusions>
        <exclusion>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
        </exclusion>
        <exclusion>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Got following error

5645 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 11 (start at RuleEngine.java:431) failed in 0.094 s due to Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 8, localhost, executor driver): java.lang.NoClassDefFoundError: Could not initialize class kafka.utils.Log4jController$

Edit:

I was able to resolve the issue by changing the kafka client version in pom.xml

      <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.0</version>
  </dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-streaming_2.11</artifactId>
  <version>2.3.0</version>
  <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>0.8.2.2</version>
</dependency>
Poojaa Karaande
  • 153
  • 2
  • 10

2 Answers2

4

Examining your pom, the problem appears to be you are using kafka 1.0.0 but using spark-streaming-kafka-0-8, which is expecting kafka 0.8. Indeed, searching for kafka.utils.Log4jController reveals it was part of the kafka-clients library in versions 0.8.1 and 0.8.2 but not in later versions. I'm no expert on Spark but I think you just need to find a version of spark-streaming-kafka library that matches your kafka version. Hope that helps

Michal Borowiecki
  • 4,244
  • 1
  • 11
  • 18
0

Your main problem is by throwing noClassDefFoundError:

How to resolve java.lang.NoClassDefFoundError:

  1. Class is not available in Java Classpath.
  2. You might be running your program using jar command and class was not defined in manifest file's ClassPath attribute.
  3. Any start-up script is overriding Classpath environment variable.
  4. Because NoClassDefFoundError is a sub class of java.lang.LinkageError it can also come if one of it dependency like native library may not available.
  5. Check for java.lang.ExceptionInInitializerError in your log file. NoClassDefFoundError due to failure of static initialization is quite common.
  6. If you are working in J2EE environment than visibility of Class among multiple Classloaders can also cause java.lang.NoClassDefFoundError, see examples and scenario section for detailed discussion.

which you can follow this link for more detail: noClassDefFoundError