0

What is the difference between none and uncompressed parquet file compression. Is there a significant memory advantage between these two compression techniques?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • you can refer to this answer too - https://stackoverflow.com/questions/35789412/spark-sql-difference-between-gzip-vs-snappy-vs-lzo-compression-formats#:~:text=GZIP%20compresses%20data%2030%25%20more,GZip%20compression%20is%20still%20better. – Koushik Roy Jul 25 '22 at 16:15

2 Answers2

1

There is no such thing as NONE Parquet file compression - https://github.com/apache/parquet-mr/blob/master/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java offers:

UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD

The class also shows:

  public static CompressionCodecName fromConf(String name) {
     if (name == null) {
       return UNCOMPRESSED;
     }
     return valueOf(name.toUpperCase(Locale.ENGLISH));
  }

So if a compression isn't specified then it defaults to UNCOMPRESSED.

Ben Watson
  • 5,357
  • 4
  • 42
  • 65
0

In Python Pandas, at least, using compression=None when exporting to parquet means no compression / uncompressed.

https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html

mh0w
  • 91
  • 7