I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame
. I'm doing that with the Pandas function read_parquet
and s3fs
, as described here:
df = pd.read_parquet(f's3://{bucket}/{path}')
However, I could only do that so far if I authenticate via environment variable or AWS config file. Because of company standards I would like instead to authenticate via local variables, such as the way we do with pyarrow.parquet
:
fs = s3fs.S3FileSystem(key=config.AWS_ACCESS_KEY_ID, secret=config.AWS_SECRET_ACCESS_KEY)
df = pq.ParquetDataset(f's3://{bucket}/{path}', filesystem=fs).read().to_pandas()
Is there a way to do that with read_parquet
? Can't I use something like a filesystem
argument with it?
In case anyone is curious, I'm not using pq.ParquetDataset
because it is too slow (I have no idea why).