Why won't file from S3 bucket load with Spark?

Question

I am attempting to load data from an AWS S3 bucket with spark. I continue to get the error:

Py4JJavaError: An error occurred while calling o152.csv. : com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: M4Z1B0MTQAY2GDCD, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: GS9ftm1p/TpmZNS4KtsAVmmRfQOIVnIg/22rhnI4i5HKF40pT/QGBAXTwrVNWsHCUQFhEOXD3Gk= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)

My code can be seen below:

I saved my AWS Access Key ID and AWS Secret Access key in the credentials.cfg file.

I have a file called payment.csv in a bucket called datalakesexamp1 in S3.

from pyspark.sql import SparkSession
import os
import configparser

config = configparser.ConfigParser()

config.read_file(open('aws/credentials.cfg'))

os.environ["AWS_ACCESS_KEY_ID"]= config['AWS']['AWS_ACCESS_KEY_ID']
os.environ["AWS_SECRET_ACCESS_KEY"]= config['AWS']['AWS_SECRET_ACCESS_KEY']

spark = SparkSession.builder\
                     .config("spark.jars.packages","org.apache.hadoop:hadoop-aws:2.7.0")\
                     .getOrCreate()

df = spark.read.csv("s3a://datalakesexamp1/payment.csv")

I believe that it could be the 2.7.0 version of hadoop but I can't seem to figure out where to find the correct one. I have tried other versions and seem to get similar errrors.

does this work? https://stackoverflow.com/a/62185901/2956135 — Emma, Feb 08 '23 at 19:29
you get an up to date copy of hadoop jars with a recent spark distribution, use them. 2.7 shipped in 2015. — stevel, Feb 09 '23 at 11:54

Why won't file from S3 bucket load with Spark?

0 Answers0