0

I got warning messages and could not access Spark Monitoring API or some tabs of Web UI (Stages, Executors, etc) when I run my application packaged with Confluent Schema Registry, Avro Serializer and Spark jars.

Part of logs:

...
Caused by: A MultiException has 3 exceptions.  They are:
1. javax.validation.ValidationException: HV000183: Unable to initialize 'javax.el.ExpressionFactory'. Check that you have the EL dependencies on the classpath, or use ParameterMessageInterpolator instead
2. java.lang.IllegalArgumentException: While attempting to resolve the dependencies of org.glassfish.jersey.server.validation.internal.ValidationBinder$ConfiguredValidatorProvider errors were found
3. java.lang.IllegalStateException: Unable to perform operation: resolve on org.glassfish.jersey.server.validation.internal.ValidationBinder$ConfiguredValidatorProvider
...

And here are my build.sbt file:

name := "Kafka2Delta"
version := "0.1"
scalaVersion := "2.12.12"

val confluent = "5.4.1"
val spark = "3.0.1"
val stocator = "1.1.3"
val delta = "0.7.0"

resolvers ++= Seq(
  "confluent" at "https://packages.confluent.io/maven/"
)

libraryDependencies ++= Seq(
  "io.confluent" % "kafka-schema-registry" % confluent,
  "io.confluent" % "kafka-avro-serializer" % confluent,
  "io.delta" %% "delta-core" % delta,
  "org.apache.spark" %% "spark-sql" % spark % Provided,
  "org.apache.spark" %% "spark-sql-kafka-0-10" % spark,
  "org.apache.spark" %% "spark-avro" % spark,
  "com.ibm.stocator" % "stocator" % stocator % Provided
)

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", "services", xs@_*) => MergeStrategy.filterDistinctLines
  case PathList("META-INF", xs@_*) => MergeStrategy.discard
  case "application.conf" => MergeStrategy.concat
  case _ => MergeStrategy.first
}

I suppose there are some conflicts between spark packages and kafka packages, but I can not find it. When I compiled the project I got some package conflict warnings, but I did not confirm they are related to the issue above.

[IJ]evicted
[warn] build source files have changed
[warn] modified files: 
[warn]   /Users/timothyzhang/IdeaProjects/Kafka2Delta/build.sbt
[warn] Apply these changes by running `reload`.
[warn] Automatically reload the build when source changes are detected by setting `Global / onChangedBuildSource := ReloadOnSourceChanges`.
[warn] Disable this warning by setting `Global / onChangedBuildSource := IgnoreSourceChanges`.
[warn] Found version conflict(s) in library dependencies; some are suspected to be binary incompatible:
[warn]  * org.apache.kafka:kafka-clients:5.4.1-ccs is selected over {2.4.1, 2.4.1}
[warn]      +- org.apache.kafka:kafka_2.12:5.4.1-ccs              (depends on 5.4.1-ccs)
[warn]      +- io.confluent:rest-utils:5.4.1                      (depends on 5.4.1-ccs)
[warn]      +- io.confluent:kafka-schema-registry:5.4.1           (depends on 5.4.1-ccs)
[warn]      +- io.confluent:kafka-schema-registry-client:5.4.1    (depends on 5.4.1-ccs)
[warn]      +- org.apache.spark:spark-token-provider-kafka-0-10_2.12:3.0.1 (depends on 2.4.1)
[warn]      +- org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1   (depends on 2.4.1)
[warn]  * javax.validation:validation-api:2.0.1.Final is selected over 1.1.0.Final
[warn]      +- org.hibernate.validator:hibernate-validator:6.0.11.Final (depends on 2.0.1.Final)
[warn]      +- org.glassfish.jersey.ext:jersey-bean-validation:2.28 (depends on 2.0.1.Final)
[warn]      +- org.glassfish.jersey.core:jersey-server:2.28       (depends on 2.0.1.Final)
[warn]      +- io.swagger:swagger-core:1.5.3                      (depends on 1.1.0.Final)
...

Could you help me and share how to fix it? Thanks.

timothyzhang
  • 730
  • 9
  • 12

1 Answers1

0

I figured out a workaround, i.e. filtered dependency jars manually. Here are my steps:

  1. Download all dependency jars for the two kafka/confluent packages using maven dependency plugin, pls referring to: Downloading all maven dependencies. My pom.xml file includes:
  <repositories>
    <repository>
      <id>Confluent</id>
      <name>Confluent Own Repo</name>
      <url>https://packages.confluent.io/maven/</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-schema-registry</artifactId>
        <version>5.4.1</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-avro-serializer</artifactId>
        <version>5.4.1</version>
    </dependency>
  </dependencies>

Actually, you could initialize the project and pom.xml using archetype generating:

mvn archetype:generate \
  -DarchetypeGroupId=org.apache.maven.archetypes \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4

update pom.xml file, and run the following command in the project folder:

mvn dependency:copy-dependencies
  1. Copy all downloaded jars in folder target/dependency into Spark jars folder:
cp downloadjar-kafka/target/dependency/* spark-3.0.1-bin-hadoop3.2/jars/
  1. Only keep high version of jars in folder spark jars if there are any jars conflicts

  2. Since I will deploy this kind of spark onto Kubernetes, I build a docker image with:

docker build -t spark-ext:v3.0.1 -f kubernetes/dockerfiles/spark/Dockerfile .

I have verified that there is no conflicts when I check all Web UI when I run my spark structured streaming connecting to Kafka

timothyzhang
  • 730
  • 9
  • 12