S3 to Pandas with local variable authentication

Question

I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here:

df = pd.read_parquet(f's3://{bucket}/{path}')

However, I could only do that so far if I authenticate via environment variable or AWS config file. Because of company standards I would like instead to authenticate via local variables, such as the way we do with pyarrow.parquet:

fs = s3fs.S3FileSystem(key=config.AWS_ACCESS_KEY_ID, secret=config.AWS_SECRET_ACCESS_KEY)
df = pq.ParquetDataset(f's3://{bucket}/{path}', filesystem=fs).read().to_pandas()

Is there a way to do that with read_parquet? Can't I use something like a filesystem argument with it?

In case anyone is curious, I'm not using pq.ParquetDataset because it is too slow (I have no idea why).

score 0 · Answer 1 · answered Feb 09 '22 at 09:13

0

I think you can pass a file-like object to pandas.read_parquet:

with fs.open(f's3://{bucket}/{path}') as fp:
  pq.read_parquet(fp)

answered Feb 09 '22 at 09:13

0x26res

11,925
11
54
108

Doesn't work for me, `KeyError: 'ETag'`, do you know what that's about? – gsmafra Feb 09 '22 at 16:58

S3 to Pandas with local variable authentication

1 Answers1