Compress output of Hadoop Archive tool

Question

I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small and large file, then not suitable for use Sequence File).

I used some option like -D mapreduce.compress.map.output=true -D mapred.map.ouput.compress.codec=org.apache.hadoop.io.compress.GzipCodec but it's not work.

Does anyone know a way for compress output of Hadoop Archive, or suggest me someway to get both goal (compress size and reduce number of file).

Any infomation is appreciate. Thanks so much.

score 0 · Answer 1 · answered Jul 04 '16 at 11:05

0

You may use mapred compress and run har on the compressed directories

answered Jul 04 '16 at 11:05

Praneeth Gudumasu

175
8

Can you give more details? When use another map-reduce job before run har can destroy the original directories structures. – dltu Jul 04 '16 at 11:11

Compress output of Hadoop Archive tool

1 Answers1