I just discovered Parquet and it met my "big" data processing / (local) storage needs:
- faster than relational databases, which are designed to run over the network (creating overhead) and just aren't as fast as a solution designed for local storage
- compared to JSON or CSV: good for storing data efficiently into types (instead of everything being a string) and can read specific chunks from the file more dynamically than JSON or CSV
But to my dismay while Node.js has a fully functioning library for it, the only Parquet lib for Python seems to be quite literally a half-measure:
parquet-python is a pure-python implementation (currently with only read-support) of the parquet format ... Not all parts of the parquet-format have been implemented yet or tested e.g. nested data
So what gives? Is there something better than Parquet already supported by Python that lowers interest in developing a library to support it? Is there some close alternative?