issue while connecting spark to redshift using spark -redshift connector

Question

I need to connect spark to my redshift instance to generate data . I am using spark 1.6 with scala 2.10 . Have used compatible jdbc connector and spark-redshift connector. But i am facing a weird problem that is : I am using pyspark

df=sqlContext.read\
    .format("com.databricks.spark.redshift")\
    .option("query","select top 10 * from fact_table")\
    .option("url","jdbc:redshift://redshift_host:5439/events?user=usernmae&password=pass")\
    .option("tempdir","s3a://redshift-archive/").load()

When i do df.show() then it gives me error of permission denied on my bucket. This is weird because i can see files being created in my bucket, but they can be read.

PS .I have set accesskey and secret access key also.

PS . I am also confused between s3a and s3n file system. Connector used : https://github.com/databricks/spark-redshift/tree/branch-1.x

From your question, I am unable to understand what the problem is — nightgaunt, Jun 07 '19 at 12:09

score 2 · Accepted Answer · answered Jun 19 '19 at 04:03

2

It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps

Add a bucket policy to that bucket that allows the Redshift Account
access Create an IAM role in the Redshift Account that redshift can
assume Grant permissions to access the S3 Bucket to the newly created role Associate the role with the Redshift cluster
Run COPY statements

answered Jun 19 '19 at 04:03

BigData-Guru

1,161
1
15
20

The above steps are perfect for solving permission issue on spark redshift connector ,but in my case the problem was with spark version .I earlier used 1.6 spark which gave me error, But same code in spark 2.2 works fine. – Aldrin Machado Jun 20 '19 at 05:26

issue while connecting spark to redshift using spark -redshift connector

1 Answers1