What is the difference between none and uncompressed parquet file compression. Is there a significant memory advantage between these two compression techniques?
Asked
Active
Viewed 410 times
0
-
you can refer to this answer too - https://stackoverflow.com/questions/35789412/spark-sql-difference-between-gzip-vs-snappy-vs-lzo-compression-formats#:~:text=GZIP%20compresses%20data%2030%25%20more,GZip%20compression%20is%20still%20better. – Koushik Roy Jul 25 '22 at 16:15
2 Answers
1
There is no such thing as NONE
Parquet file compression - https://github.com/apache/parquet-mr/blob/master/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java offers:
UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD
The class also shows:
public static CompressionCodecName fromConf(String name) {
if (name == null) {
return UNCOMPRESSED;
}
return valueOf(name.toUpperCase(Locale.ENGLISH));
}
So if a compression isn't specified then it defaults to UNCOMPRESSED
.

Ben Watson
- 5,357
- 4
- 42
- 65
0
In Python Pandas, at least, using compression=None
when exporting to parquet means no compression / uncompressed.
https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html

mh0w
- 91
- 7