0

I'm trying to output the contents of a table I have in hive to hdfs as a single csv file, however when I run the code below it splits it into 5 separate files of ~500mb each. Am I missing something in terms of outputting the results as one single csv file?

set hive.execution.engine=tez;
set hive.merge.tezfiles=true;
INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable;
Rossy
  • 65
  • 1
  • 6

1 Answers1

1

Add orderby clause in your select query then Hive will force to run single reducer which will create only one file in HDFS directory.

INSERT OVERWRITE DIRECTORY  "/dl/folder_name"
row format delimited fields terminated by ','
select * from schema.mytable order by <col_name>;

Note:

If the number of rows in the output is too large, the single reducer could take a very long time to finish.

notNull
  • 30,258
  • 4
  • 35
  • 50