I'm able to read a parquet file located on GCS thanks to this answer (read the first answer). I used the pd.read_parquet
function, with pyarrow engine.
I'd like now to access the parquet metadata without download the data into the dataframe. Is it possible to do that with pandas?
Asked
Active
Viewed 6,900 times
3

alcor
- 515
- 1
- 8
- 21
-
Hi, I am also having a similar question. Have you found any solution for this? – Chins Kuriakose Sep 23 '20 at 09:04
1 Answers
4
I found a solution, using gcsfs
without Pandas:
import pyarrow.parquet as pq
import gcsfs
fs = gcsfs.GCSFileSystem(project=myprojectname)
f = fs.open(myfilepath)
myschema = pq.ParquetFile(f).schema
print(myschema)
-
How to make this work if you have partitioned data stored in multiple parquet files like here - https://stackoverflow.com/questions/75529064/how-to-load-multiple-partition-parquet-files-from-gcs-into-pandas-dataframe – Regressor Feb 22 '23 at 14:58