Pandas: How to access in house netapp storage grid file

Question

I have NetApp storage grid(S3) in company infrastructure. I am new to S3. After processing a csv file in Pandas, I need to write this file to S3. The URL for the Storage grid is https://myCompanys3.storage.net and the bucket is 'test_bucket'. I referred to https://stackoverflow.com/a/51777553/13065899

Followed these steps based on other reading on Python/Pandas/S3:

Created folder .aws in my users folder (windows laptop)
Created credentials file with these entries:

'''

[default]
aws_access_key_id=myAccessKey
aws_secret_access_key=mySecretAccessKey

'''

pip install s3fs
Wrote this line of code:

df.to_csv('https://myCompanys3.storage.net/test_bucket/myTest.csv')

Got this error: urllib.error.HTTPError: HTTP Error 403: Forbidden Is the path given in to_csv above the correct way to construct the full path the file?

All examples I have seen so far start with 's3://' and not a full url.

Is s3 a key word and needed for any read/write to storage grid ?

Tried

df.to_csv('s3://https://s3.medcity.net://hpg-dl-dev/PandasInvoiceTest.csv', index=False)

Got this error: Invalid bucket name "https:": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$"

Can someone help me with what I am missing? Perhaps a s3 configuration where I externalize the url?

Thank you in advance.

score 1 · Answer 1 · answered Aug 13 '20 at 06:17

1

Use boto3 to establish your connection and download the file
stream the string object into pd.read_csv() using io.StringIO()

import boto3, json
from pathlib import Path
import io

with open(Path.cwd().joinpath("aws-secrets.json")) as f: cfg = json.load(f)
sess = boto3.session.Session(region_name=cfg["REGION_NAME"],
                                 aws_access_key_id=cfg["ACCESS_ID"],
                                 aws_secret_access_key=cfg["ACCESS_KEY"])

pd.read_csv(io.StringIO(
    sess.resource("s3").Object("silicon-myfiles", "elevationdata.csv").get()["Body"].read().decode()
))

answered Aug 13 '20 at 06:17

Rob Raymond

29,118
3
14
30

I had considered taking this route. But since Pandas 0.22+ supports reading from S3 url to the read_csv / to_csv, I wanted to use it without using boto3 in my code and also avoid creating the intermediary StringIO buffer. – Raghavendra Channappa Aug 13 '20 at 12:44
it's worth talking to your infrastructure team how they have setup the infra to support developers. Under the hood the `s3fs` library will still use `boto3` and string buffers. Just less control on where secrets are stored and how they are accessed – Rob Raymond Aug 13 '20 at 13:07

Pandas: How to access in house netapp storage grid file

1 Answers1