Outputting hive table to HDFS as a single file

Question

I'm trying to output the contents of a table I have in hive to hdfs as a single csv file, however when I run the code below it splits it into 5 separate files of ~500mb each. Am I missing something in terms of outputting the results as one single csv file?

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable;

Read this answer please: https://stackoverflow.com/a/56596869/2700344 — leftjoin, Feb 01 '20 at 19:13

score 1 · Accepted Answer · answered Feb 01 '20 at 15:59

Add orderby clause in your select query then Hive will force to run single reducer which will create only one file in HDFS directory.

INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable order by <col_name>;

Note:

If the number of rows in the output is too large, the single reducer could take a very long time to finish.

Outputting hive table to HDFS as a single file

1 Answers1