I can download a single snappy.parquet partition file with:
aws s3 cp s3://bucket/my-data.parquet/my-data-0000.snappy.parquet ./my-data-0000.snappy.parquet
And then use:
parquet-tools head my-data-0000.snappy.parquet
parquet-tools schema my-data-0000.snappy.parquet
parquet-tools meta my-data-0000.snappy.parquet
But I'd rather not download the file, and I'd rather not have to specify a particular snappy.parquet file. Instead the prefix: "s3://bucket/my-data.parquet"
Also what if the schema is different in different row groups across different partition files?
Following instructions here I downloaded a jar file and ran
hadoop jar parquet-tools-1.9.0.jar schema s3://bucket/my-data.parquet/
But this resulted in error: No FileSystem for schema "s3".
This answer seems promising, but only for reading from HDFS. Any solution for S3?