1

I'm downloading a file (to be precise a parquet set of files) from S3 and converting that to a Pandas DataFrame. I'm doing that with the Pandas function read_parquet and s3fs, as described here:

df = pd.read_parquet(f's3://{bucket}/{path}')

However, I could only do that so far if I authenticate via environment variable or AWS config file. Because of company standards I would like instead to authenticate via local variables, such as the way we do with pyarrow.parquet:

fs = s3fs.S3FileSystem(key=config.AWS_ACCESS_KEY_ID, secret=config.AWS_SECRET_ACCESS_KEY)
df = pq.ParquetDataset(f's3://{bucket}/{path}', filesystem=fs).read().to_pandas()

Is there a way to do that with read_parquet? Can't I use something like a filesystem argument with it?

In case anyone is curious, I'm not using pq.ParquetDataset because it is too slow (I have no idea why).

Andrew Gaul
  • 2,296
  • 1
  • 12
  • 19
gsmafra
  • 2,434
  • 18
  • 26

1 Answers1

0

I think you can pass a file-like object to pandas.read_parquet:

with fs.open(f's3://{bucket}/{path}') as fp:
  pq.read_parquet(fp)
0x26res
  • 11,925
  • 11
  • 54
  • 108