0

I am attempting to load data from an AWS S3 bucket with spark. I continue to get the error:

Py4JJavaError: An error occurred while calling o152.csv. : com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: M4Z1B0MTQAY2GDCD, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: GS9ftm1p/TpmZNS4KtsAVmmRfQOIVnIg/22rhnI4i5HKF40pT/QGBAXTwrVNWsHCUQFhEOXD3Gk= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)

My code can be seen below:

I saved my AWS Access Key ID and AWS Secret Access key in the credentials.cfg file.

I have a file called payment.csv in a bucket called datalakesexamp1 in S3.

from pyspark.sql import SparkSession
import os
import configparser

config = configparser.ConfigParser()

config.read_file(open('aws/credentials.cfg'))

os.environ["AWS_ACCESS_KEY_ID"]= config['AWS']['AWS_ACCESS_KEY_ID']
os.environ["AWS_SECRET_ACCESS_KEY"]= config['AWS']['AWS_SECRET_ACCESS_KEY']

spark = SparkSession.builder\
                     .config("spark.jars.packages","org.apache.hadoop:hadoop-aws:2.7.0")\
                     .getOrCreate()

df = spark.read.csv("s3a://datalakesexamp1/payment.csv")

I believe that it could be the 2.7.0 version of hadoop but I can't seem to figure out where to find the correct one. I have tried other versions and seem to get similar errrors.

buddemat
  • 4,552
  • 14
  • 29
  • 49
Devin
  • 1

0 Answers0