After some searching I failed to find a thorough comparison of fastparquet
and pyarrow
.
I found this blog post (a basic comparison of speeds).
and a github discussion that claims that files created with fastparquet
do not support AWS-athena (btw is it still the case?)
when/why would I use one over the other? what are the major advantages and disadvantages ?
my specific use case is processing data with dask
writing it to s3 and then reading/analyzing it with AWS-athena.