3

I'm able to read a parquet file located on GCS thanks to this answer (read the first answer). I used the pd.read_parquet function, with pyarrow engine. I'd like now to access the parquet metadata without download the data into the dataframe. Is it possible to do that with pandas?

alcor
  • 515
  • 1
  • 8
  • 21

1 Answers1

4

I found a solution, using gcsfs without Pandas:

import pyarrow.parquet as pq
import gcsfs

fs = gcsfs.GCSFileSystem(project=myprojectname)

f = fs.open(myfilepath)
myschema = pq.ParquetFile(f).schema

print(myschema)
jamiet
  • 10,501
  • 14
  • 80
  • 159
alcor
  • 515
  • 1
  • 8
  • 21
  • How to make this work if you have partitioned data stored in multiple parquet files like here - https://stackoverflow.com/questions/75529064/how-to-load-multiple-partition-parquet-files-from-gcs-into-pandas-dataframe – Regressor Feb 22 '23 at 14:58