1

I'm doing something about the combination of spark_with_hadoop2.7 (2.4.3), hadoop (3.2.0) and Ceph luminous. When I tried to use spark to access ceph (for example, start spark-sql on shell), exception like below shows:

 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)Lorg/apache/hadoop/conf/Configuration;
        at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:740)
        at org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.<init>(SimpleAWSCredentialsProvider.java:58)
        at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)

For NoSuchMethodError, it's most likely because the compiled class version is different from running class version according to how-do-i-fix-a-nosuchmethoderror.

To access Ceph, aws related jars aws-java-sdk-bundle-1.11.375.jar and hadoop-aws-3.2.0.jar under $HADOOP_HOME/share/hadoop/tools/lib are actually used. I did operations below:

1, Copy those two jars to $SPARK_HOME/jars
2, Modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh to add statements below:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

By doing steps above, I can start hdfs to access ceph, for example, I can use hdfs dfs -ls to list folders under ceph bucket. It proves that the aws related jars works fine.(Just as per my understanding).

But why exceptions about aws s3a throw when I invoke spark?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Eugene
  • 10,627
  • 5
  • 49
  • 67
  • Possible duplicate of [How to access s3a:// files from Apache Spark?](https://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-spark) – Neo-coder Jul 29 '19 at 09:43
  • @Yogesh I checked that link before, it doesn't solve my problem. According to my current understanding, the spark I used is spark2.4.3_with_built-in_hadoop2.7, it's should be the problem. I need to use spark2.4.3_without_hadoop version I think. – Eugene Jul 30 '19 at 01:28

1 Answers1

8

All the hadoop-* JARs need to be 100% matching on versions, else you get to see stack traces like this.

For more information please reread

stevel
  • 12,567
  • 1
  • 39
  • 50
  • 1
    How to find whether jars are matching or not without trial and error method? – Nandha Nov 29 '19 at 09:05
  • they all have the same suffix, like -2.8.4 or -3.1.2 – stevel Dec 17 '19 at 17:05
  • 1
    hi, I have the same error so I tried, but something is not clear: should I copy all of hadoop-*-3.2.1.jar to the spark jars folder? (my Hadoop version is 3.2.1) (I have all original file in the spark jars folder with version 2.7.3) – Frank May 19 '20 at 21:40
  • All the hadoop-* JARs need to be 100% matching on versions, else you get to see stack traces like this. – stevel May 20 '20 at 09:56
  • `all the hadoop-* JARs need to be 100% matching on versions` is the key sentence with resolved my hours of debugging issues. Thanks – Faaiz Aug 11 '22 at 17:03