0

My EMR is in EAST region , I am trying read s3 file which in West region using spark session. I am seeing connection time out issue.

I am able access same file suing aws CLI command by specifying --region us-west-2 .

Can you help me to achieve same thing using sparkSession or spark-shell . how to pass region while reading file from spark session.

Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectTimeoutException: Connect to Xxxxxxxx-west2.s3.amazonaws.com:443 [xxxxxxx-lake-west2.s3.amazonaws.com/xxxx] failed: connect timed out
  at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:150)
  at com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
undefined_variable
  • 6,180
  • 2
  • 22
  • 37
Ramesh
  • 121
  • 1
  • 2
  • 10

1 Answers1

0

you can use --conf and set the value of s3 region while running your spark-submit command something like below:

spark-submit --name "Test Job" \
    --conf "spark.hadoop.fs.s3a.endpoint=s3-us-west-2.amazonaws.com"
    <your_remaining_command_goes_here>

and if you want to set the same programmatically within your scala code then you can use

val sparkConf = new SparkConf()
sparkConf.setAppName("Test Job")
sparkConf.set("spark.hadoop.fs.s3a.endpoint", "s3-us-west-2.amazonaws.com")

val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
  • I have added "fs.s3a.endpoint" but it is not working for me. I have added to both sparkconf and sparkcontext also . sparkContext.hadoopConfiguration().set("fs.s3a.endpoint", "s3.us-west-2.amazonaws.com"); – Ramesh May 21 '19 at 20:59