I am getting java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders exception while submitting my spark job using spark-submit
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)Lorg/apache/hadoop/conf/Configuration;
at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:740)
at org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.<init>(SimpleAWSCredentialsProvider.java:58)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
Spark version Installed: 2.4.5
My Configuration: build.gradle:
buildscript {
repositories {
maven {
url "https://*********/****/content/repositories/thirdparty"
credentials {
username ****User
password ****Pwd
}
}
}
}
plugins {
id 'java'
id 'com.github.johnrengelman.shadow' version '1.2.3'
}
group 'com.felix'
version '1.0-SNAPSHOT'
sourceCompatibility = 1.8
repositories {
mavenCentral()
mavenLocal()
}
dependencies {
compile group: 'org.apache.spark', name: 'spark-hadoop-cloud_2.11', version: '2.4.2.3.1.3.0-79'
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
compileOnly group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind
compileOnly group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.7.3'
// https://mvnrepository.com/artifact/org.apache.parquet/parquet-column
compileOnly group: 'org.apache.parquet', name: 'parquet-column', version: '1.10.1'
// https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop
compileOnly group: 'org.apache.parquet', name: 'parquet-hadoop', version: '1.10.1'
// https://mvnrepository.com/artifact/org.apache.spark/spark-sketch
compileOnly group: 'org.apache.spark', name: 'spark-sketch_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
compileOnly group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-catalyst
compileOnly group: 'org.apache.spark', name: 'spark-catalyst_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-tags
compileOnly group: 'org.apache.spark', name: 'spark-tags_2.11', version: '2.4.5'
compileOnly group: 'org.apache.spark', name: 'spark-avro_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
compileOnly group: 'org.apache.spark', name: 'spark-hive_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm6-shaded
compile group: 'org.apache.xbean', name: 'xbean-asm7-shaded', version: '4.15'
// https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler
compileOnly group: 'org.codehaus.janino', name: 'commons-compiler', version: '3.0.9'
// https://mvnrepository.com/artifact/org.codehaus.janino/janino
compileOnly group: 'org.codehaus.janino', name: 'janino', version: '3.0.9'
//HIVE Metastore
compile group: 'org.postgresql', name: 'postgresql', version: '42.2.9'
compile 'com.google.guava:guava:22.0'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.2.1'
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.2.1'
compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.2.1'
compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271'
compile group: 'io.delta', name: 'delta-core_2.11', version: '0.5.0'
compile group: 'joda-time', name: 'joda-time', version: '2.10.5'
compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6'
}
shadowJar {
zip64 true
}
Spark job:
df.distinct()
.withColumn("date", date_format(col(EFFECTIVE_PERIOD_START), "yyyy-MM-dd"))
.repartition(col("date"))
.write()
.format(fileFormat)
.partitionBy("date")
.mode(SaveMode.Append)
.option("fs.s3a.committer.name", "partitioned")
.option("fs.s3a.committer.staging.conflict-mode", "append")
.option("spark.sql.sources.commitProtocolClass", "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")
.option("spark.sql.parquet.output.committer.class", "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter")
.option("compression", compressionCodecName.name().toLowerCase())
.save(DOWNLOADS_NON_COMPACT_PATH);
Executed script: spark-submit --class com.felix.DataIngestionApplication --master local DataIngestion-1.0-SNAPSHOT-all.jar
From what I understand is the hadoop version is creating the issue All the hadoop-* JARs need to be 100% matching on versions. So I have ensured that all org.apache.hadoop dependencies are of the same version (3.2.1). But still it's giving this error.
I want to use hadoop version 3 or newer since that provides newer S3A committers like "PartitionedStagingCommitter". How does everybody using this with Spark 2.4.5?
How can I force/override hadoop version to use as 3.2.1 instead of hadoop versions in Spark/jars? When I looked at /usr/local/Cellar/apache-spark/2.4.5/libexec/jars/ I could see hadoop-common-2.7.3.jar, hadoop-client-2.7.3.jar etc. So how do we force hadoop newer version and therefore I could leverage new S3A comitters.?
Note: If I don't using spark-submit and instead run the application from IntelliJ with all dependencies as compile, then the app starts and executes without exception. I could see the data getting inserted in S3.