6

Hive version: 1.2.1

Configuration:

set hive.execution.engine=tez;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=256000000;
set hive.merge.tezfiles=true;

HQL:

ALTER TABLE `table_name` PARTITION (partion_name1 = 'val1', partion_name2='val2', partion_name3='val3', partion_name4='val4') CONCATENATE;

I use the HQL to merge files of specific table / partition. However, after execution there are still many files in output directory; and their size are far less than 256000000. So how to decrease the number of output files.

BTW, use MapReduce instead of Tez also didn't work.

Po Zhou
  • 595
  • 1
  • 5
  • 17

2 Answers2

-2

You may set your reducer number to 1 then, it would only create one output file.

You may do it with the following;

set mapred.reduce.tasks=1
Ducaz035
  • 3,054
  • 2
  • 25
  • 45
  • Please check the comment: BTW, use MapReduce instead of Tez also didn't work. So he may use MapReduce as well if he wants to. In addition, you may use the configuration above also for Tez. – Ducaz035 Apr 19 '16 at 13:31
  • I can also ensure you that it does solve the issue. Maybe tez is slightly different story but it does work for MapReduce and it is what user asked. – Ducaz035 Apr 19 '16 at 14:00
  • I have tried it rigth now and the result is that I have 25 files. Moreover the triggered MapReduce job is a map-only job. Maybe you are using a different Hive version. I'm using Hive 1.2.1 and the files are ORC. And in these conditions, your solution doesn't work. – mgaido Apr 19 '16 at 14:09
  • Well, can you please try to set the mappers to 1 ? – Ducaz035 Apr 19 '16 at 14:21
  • Well, then i am out of ideas sorry for that. – Ducaz035 Apr 19 '16 at 14:26
  • this does not work – pavel_orekhov Jul 22 '23 at 23:05
-2

Maybe u can try insert overwrite table ... partition ( ... ) select * from ...

This one can use the merge setting for tezfiles.

Fabien
  • 4,862
  • 2
  • 19
  • 33
heyhey
  • 1