I have a Spark application fetching data from hdfs and ingesting data into S3.Below are the versions of different components i am using.
spark : 2.3.1 hadoop : 2.7.3 scala : 2.11.8
I am using hadoop-aws-2.7.3.jar, hadoop-common-2.7.3.jar and aws-java-sdk-1.7.4.jar. I followed some of the blogs related to hadoop and also referred mavenrepository site for getting the right combination of jars.
This is the code where i am uploading file to S3
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "<access_key>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "<access_endpoint>")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")
val wikipediaDataitems = spark.read.json("<some_json_file_in_hdfs>")
wikipediaDataitems.write.format("json").save("s3a://<bucket_name>/wikipedia.json")
Below is the error i am getting
Caused by: java.lang.IllegalAccessError: tried to access method
org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>.
(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class
org.apache.hadoop.fs.s3a.S3AInstrumentation
at
org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:163)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:185)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:112)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:146)
I did go through lot of stackoverflow questions who have faced the same issue and tried different combinations of hadoop-aws and hadop-common and aws-sdk jars, no luck so far.
Combinations tried so far and also mentioned relevant errors for each combination:
hadoop-aws-2.7.3.jar,hadoop-common-2.7.3.jar,aws-java-sdk-1.10.6.jar
org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:452)
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:548)
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:278)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
... 49 elided
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
hadoop-aws-2.8.2.jar,hadoop-common-2.8.2.jar,aws-java-sdk-1.10.6.jar
java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
hadoop-aws-2.7.3.jar,hadoop-common-2.7.3.jar,aws-java-sdk-1.11.123.jar Caused by: java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 66 more
hadoop-aws-2.7.7.jar, hadoop-aws-2.7.7.jar and aws-java-sdk-1.7.4.jar
Caused by: java.lang.IllegalAccessError: tried to access method
org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>.
(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class
org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:163)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:185)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:112)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:146)
Can anyone help me