This is my first week with Hive and HDFS, so please bear with me.
Almost all the ways I saw so far to merge multiple ORC files suggest using ALTER TABLE
with CONCATENATE
command.
But I need to merge multiple ORC files of the same table without having to ALTER
the table. Another option is to create a copy of the existing table and then use ALTER TABLE
on that so that my original table remains unchanged. But I can't do that as well because space and data redundancy reasons.
The thing I'm trying to achieve (ideally) is: I need to transport these ORCs as one file per table into a cloud environment. So, is there a way that I can merge the ORCs on-the-go during the transfer process into cloud? Can this be achieved with/without Hive
, maybe directly in HDFS
?