0

I am trying to read files in my buckets that are a mix of csv/xslx and I am getting a 403 which I do not quite understand since I am setting AWS creds through the keychain and env vars. I am using a URL over https, when I switch the URL to s3:// it tells me the bucket doesn't exist which it definitely does. I have s3fs installed as well.

TLDR: Https throws 403s, s3:// throws bucket doesn't exist when it does.

Code:

def get_file(project_name, uid) -> list:
    files = []
    s3 = boto3.resource('s3', region_name='us-east-2')
    bucket_str = 'stackstr-' + uid
    url = 'https://' + bucket_str + '.s3.us-east-2.amazonaws.com/'
    bucket = s3.Bucket(bucket_str)
    for obj in bucket.objects.filter(Prefix=project_name + '/raw_datasets'):
        link = url + obj.key
        files.append(link)
    print(files)
    return files


def generate_dataframes(files) -> pd.DataFrame:
    df_list = []
    for fname in files:
        ext = fname.split(".")[-1]
        if ext == 'xlsx':
            df = pd.read_excel(fname)
            df_list.append(df)

        if ext == 'csv':
            df = pd.read_csv(fname)
            df_list.append(df)

    print(df_list)
dmc94
  • 536
  • 1
  • 5
  • 16
  • Does this answer your question? [How to import a text file on AWS S3 into pandas without writing to disk](https://stackoverflow.com/questions/37703634/how-to-import-a-text-file-on-aws-s3-into-pandas-without-writing-to-disk) – Michael Delgado Jun 08 '20 at 02:11
  • @MichaelDelgado Not really, I have read that post and according to another post on StackOverflow you can send full s3 URLs into read_excel as well as read_csv. I am getting creds from env vars via AWS Cred chain but its apparently not working within the second function? unless you cant run a request to a s3 URL? – dmc94 Jun 08 '20 at 02:15
  • 1
    Specifically, if reading directly from the URL with pd.read_csv or excel, don't include the `.s3.us-east-2.amazonaws.com` suffix. just `s3://[bucket_name]/[blob-path]`. see the [s3fs docs](https://s3fs.readthedocs.io/en/latest/) for examples/credential information. – Michael Delgado Jun 08 '20 at 02:17
  • 1
    @MichaelDelgado awesome thank you! cant belive I missed that. – dmc94 Jun 08 '20 at 02:26

1 Answers1

0

Michael Delgado provided the correct answer below:

Specifically, if reading directly from the URL with pd.read_csv or excel, don't include the .s3.us-east-2.amazonaws.com suffix. just s3://[bucket_name]/[blob-path]. see the s3fs docs for examples/credential information

dmc94
  • 536
  • 1
  • 5
  • 16