2

I am trying to read redshift table data into red-shift data frame and writing that dataframe in another redshift table. Using following .jar in spark_submit for this task.

Here is the command:

spark-submit --jars RedshiftJDBC41-1.2.12.1017.jar,minimal-json-0.9.4.jar,spark-avro_2.11-3.0.0.jar,spark-redshift_2.10-2.0.0.jar,aws-java-sdk-sqs-1.11.694.jar,aws-java-sdk-s3-1.11.694.jar,aws-java-sdk-core-1.11.694.jar --packages org.apache.hadoop:hadoop-aws:2.7.4 t.py 

I tried changing the version of all the jar and hadoop-aws version as well accordingly as mentioned in various stackoverflow answer with no luck.

Traceback (most recent call last):
  File "/home/ubuntu/trell-ds-framework/data_engineering/data_migration/t.py", line 21, in <module>
    .option("tempdir", "s3a://AccessKey:AccessSecret@big-query-to-rs/rs_temp_data") \
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/home/ubuntu/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at com.databricks.spark.redshift.Utils$.assertThatFileSystemIsNotS3BlockFileSystem(Utils.scala:124)
    at com.databricks.spark.redshift.RedshiftRelation.<init>(RedshiftRelation.scala:52)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:49)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

2019-12-21 14:38:03 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2019-12-21 14:38:03 INFO  AbstractConnector:318 - Stopped Spark@3c115b0a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-12-21 14:38:03 INFO  SparkUI:54 - Stopped Spark web UI at http://ip-172-30-1-193.ap-south-1.compute.internal:4040
2019-12-21 14:38:03 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!

Can any body help me out here about what could be the issue? Is it library issue of .jar or hadoop or something else?

Thanks.

iamabhaykmr
  • 1,803
  • 3
  • 24
  • 49
  • 1
    Good question .... – Chandni Dec 21 '19 at 09:33
  • I think its a version mismatch, the version of a library you are using don't have the method i guess – Arun Kamalanathan Dec 21 '19 at 09:38
  • hadoop 2.7 was built against AWS 1.7; the library has changed too much for it to work with 2.11. Upgrade all the hadoop-* JARs in your spark installation to a newer version, such as 3.11 (do not just try to update hadoop-aws, that is doomed) – stevel Dec 30 '19 at 15:13
  • see also https://stackoverflow.com/questions/43929025/spark-read-s3-using-sc-textfiles3a-bucket-filepath-java-lang-nosuchmethod – Randall Whitman May 26 '21 at 22:12
  • Does this answer your question? [Spark read s3 using sc.textFile("s3a://bucket/filePath"). java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager](https://stackoverflow.com/questions/43929025/spark-read-s3-using-sc-textfiles3a-bucket-filepath-java-lang-nosuchmethod) – stevel Jun 02 '22 at 13:55

4 Answers4

1

I ran into a similar kind of error few days back.

I could able to resolve by upgrading the 'aws-java-sdk-s3' jar to the latest version.

Dheeraj
  • 332
  • 3
  • 4
1

For my case I was using the s3a connector and pyspark on SageMaker with Hadoop 2.7.3 jars. To fix it, I needed to install the following jars hadoop-aws-2.7.3.jar and aws-java-sdk-1.7.4.jar. This resolved my issue instead of using the documented aws-java-sdk-bundle jar.

My code:

builder = SparkSession.builder.appName('appName')
builder = builder.config('spark.jars', 'hadoop-aws-2.7.3.jar,aws-java-sdk-1.7.4.jar')
spark = builder.getOrCreate()
Greg
  • 5,422
  • 1
  • 27
  • 32
0

I had the same issue while using spark-shell.

The issue was that the jar was loaded at the end of all jars, I moved it at the beginning and the issue was resolved.

WannaGetHigh
  • 3,826
  • 4
  • 23
  • 31
0

as the s3a troubleshooting docs say

Do not attempt to “drop in” a newer version of the AWS SDK than that which the Hadoop version was built with Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see

stop using 8 year old hadoop 2.7 binaries, upgrade to a version of spark with 3.3.1 binaries and try again

stevel
  • 12,567
  • 1
  • 39
  • 50