I'm trying to do something like this, reading a list of files from an S3 bucket into a pyarrow table.
If I specify the filename I can do:
from pyarrow.parquet import ParquetDataset
import s3fs
dataset = ParquetDataset(
"s3://path/to/file/myfile.snappy.parquet",
filesystem=s3fs.S3FileSystem(),
)
And everything works as expected. However if I do:
dataset = ParquetDataset(
"s3://path/to/file",
filesystem=s3fs.S3FileSystem(),
)
I get:
pyarrow/_parquet.pyx:1036: in pyarrow._parquet.ParquetReader.open
pyarrow.lib.ArrowIOError: Invalid Parquet file size is 0 bytes