2

I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small and large file, then not suitable for use Sequence File).

I used some option like -D mapreduce.compress.map.output=true -D mapred.map.ouput.compress.codec=org.apache.hadoop.io.compress.GzipCodec but it's not work.

Does anyone know a way for compress output of Hadoop Archive, or suggest me someway to get both goal (compress size and reduce number of file).

Any infomation is appreciate. Thanks so much.

Community
  • 1
  • 1
dltu
  • 34
  • 8

1 Answers1

0

You may use mapred compress and run har on the compressed directories

  • Can you give more details? When use another map-reduce job before run har can destroy the original directories structures. – dltu Jul 04 '16 at 11:11