I have a partitioned ORC table in Hive. After loading the table with all possible partitions I get on HDFS - multiple ORC files i.e. each partition directory on HDFS has an ORC file in it. I need to combine all these ORC files under each partition to a single big ORC file for some use-case.
Can someone suggest me a way to combine these multiple ORC files (belonging to each partition) into a single big ORC file.
I've tried creating a new Non Partitioned ORC table from the Partitioned table.. It does reduce the number of files but not to a single file.
PS: Creating a table out of another one is a completely a map task and hence setting the number of reducers to 1 using the property 'set mapred.reduce.tasks=1;' doesn't help.
Thanks