I built a spark cluster myself. When I read the parquet file on s3, I have an error: IllegalAccessError

Question

error:

Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
    at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
    at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:216)
    at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:139)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:174)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:44)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:620)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:604)
    at net.appcloudbox.autopilot.eventstats.MiningTask$$anonfun$clean_data$2.apply(MiningTask.scala:141)
    at net.appcloudbox.autopilot.eventstats.MiningTask$$anonfun$clean_data$2.apply(MiningTask.scala:140)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at net.appcloudbox.autopilot.eventstats.MiningTask.clean_data(MiningTask.scala:140)
    at net.appcloudbox.autopilot.eventstats.MiningTask.run(MiningTask.scala:35)
    at net.appcloudbox.autopilot.eventstats.EventStats$.main(EventStats.scala:39)
    at net.appcloudbox.autopilot.eventstats.EventStats.main(EventStats.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Some people say that it is because of hadoop version problem。 At the beginning，I am using hadoop-2.7.5.tar.gz and spark-2.3.0-bin-hadoop2.7.tgz，My job has encountered the above problem.When I use hadoop-2.8.5.tar.gz and spark-2.3.0-bin-hadoop2.7.tgz ,My job has encountered the same problem again。My code is as follows：

spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", config.get("aws_access_key_id"))
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", config.get("aws_secret_access_key"))      
spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
......
spark.read.parquet("s3a://bucket/...../sample.parquet").rdd

Possible duplicate of [Unable to access S3 data using Spark 2.2](https://stackoverflow.com/questions/48750464/unable-to-access-s3-data-using-spark-2-2) — stevel, Oct 29 '18 at 13:02

score 0 · Answer 1 · answered Nov 05 '18 at 08:14

0

I solved the problem.As you can see, I am using hadoop-2.8.5.tar.gz and spark-2.3.0-bin-hadoop2.7.tgz,in the spark installation directory jars,Are jars for hadoop2.7.x,so, you just need to replace them with the 2.8.5 version.

answered Nov 05 '18 at 08:14

Shaokai Li

69
1
5

This actually helped me. All the logs said I was using hadoop 2.8.5 but the spar dir had 2.7.3 jars and used them from there. Super tricky problem to spot. – user1816142 Mar 12 '20 at 14:33

I built a spark cluster myself. When I read the parquet file on s3, I have an error: IllegalAccessError

1 Answers1