Is there a Parquet equivalent for Python?

Question

I just discovered Parquet and it met my "big" data processing / (local) storage needs:

faster than relational databases, which are designed to run over the network (creating overhead) and just aren't as fast as a solution designed for local storage
compared to JSON or CSV: good for storing data efficiently into types (instead of everything being a string) and can read specific chunks from the file more dynamically than JSON or CSV

But to my dismay while Node.js has a fully functioning library for it, the only Parquet lib for Python seems to be quite literally a half-measure:

parquet-python is a pure-python implementation (currently with only read-support) of the parquet format ... Not all parts of the parquet-format have been implemented yet or tested e.g. nested data

So what gives? Is there something better than Parquet already supported by Python that lowers interest in developing a library to support it? Is there some close alternative?

@DanScally thanks. I overlooked that result seeing it was from Apache thinking it was a how-to on implementing the format manually. — J.Todd, Dec 18 '20 at 12:00

score 4 · Accepted Answer · answered Dec 18 '20 at 12:01

Actually, you can read and write parquet with pandas which is commonly use for data jobs (not ETL on big data tho). For handling parquet pandas use two common packages:

pyarrow is a cross-platform tool providing columnar format for memory. Parquet is also a columnar format, it has support for it though it has variety of formats and it is a broader lib.

fastparquet is solely designed to focus on parquet format to use on process for python-based bigdata flows.

This post covers usage. https://stackoverflow.com/questions/33813815/how-to-read-a-parquet-file-into-pandas-dataframe — philosofool, Jun 08 '21 at 13:32

Is there a Parquet equivalent for Python?

1 Answers1