2

Compression level in org.apache.hadoop.io.compress.zstd.ZStandardCompressor does't seem to work. I see the reset function getting called in ZStandardCompressor constructor which is turn call init(level, stream) to call native function which I believe to be only place setting zstd parameter. In my test, I am ensuring that this is being called but calling it different levels like 1, 5, 10. 20 etc did not make any difference as output size is exact same.

Hadoop doesn't seem to use zstd-jni and use own stuff to use zstd. I am sure people are using different levels in hadoop. Could you someone point I should go around chasing for next step

ondway
  • 114
  • 1
  • 11
  • Also highly interested in an answer. How did you pass level values ? – bonnal-enzo Apr 20 '20 at 18:47
  • 1
    I created CustomParquetWriter where we create InternalParquetRecordWriter similar to current parquet code but I pass my own zstd compressor. I miss the comment earlier, so sorry for late response – ondway May 08 '20 at 10:45

1 Answers1

0

Given that people are finding this question without answer, I am adding solution which I used. InternalParquetRecordWriter has compressor as argument, so I integrated zstd-jni library here by creating a compressor by extending BytesInputCompressor.

ondway
  • 114
  • 1
  • 11