I'm trying to write a Dataset
object as a Parquet file using java.
I followed this example to do so but it is absurdly slow.
It takes ~1.5 minutes to write ~10mb of data, so it isn't going to scale well when I want to write hundreds of mb of data.
I did some cpu profiling and found that 99% of the time came from the ParquetWriter.write()
method.
I tried increasing the page size and block size of the ParquetWriter
but it doesn't seem to have any effect on the performance. Is there any way to make this process faster or is it just a limitation of the Parquet library?