3

I am having 2 tables src and dest, same schema. I would like to overwrite partitions of dest using the updated data on src table.

On hive there are 2 different statements:

  1. Insert Overwrite

  2. Exchange Partition

for exchange partition it's problematic, because some of the partitions already exist.(as stated on documentation it is not possible to exchange if it is already exist). I also need support concurrency and atomicity, which means drop + exchange may not work but I am not sure about that.

for insert overwrite it's problematic because it's take a lot of time (lot of data copied, there is not HOT swap for the partition).

Is there any hybrid way to solve this issue ? like exchange overwriteor something like that? Is there any plan to add that functionality to hive if it is not exist.

David H
  • 1,346
  • 3
  • 16
  • 29
  • 1
    alter table partition location ... – leftjoin Nov 27 '16 at 18:09
  • this is not a good option. why? because it would be hard to manage where each partition would be when you have a lot. you want all partitions to be under the same location. – David H Nov 27 '16 at 18:41
  • http://stackoverflow.com/a/37744071/2700344 incremental update – leftjoin Nov 28 '16 at 14:00
  • what you are saying is that only option 1 is possible, and we aould have to deal with overhead of copy files from one table to another. Is there any way to use an optimization of hdfs HardLink if both schemas have same structure ? – David H Nov 28 '16 at 14:08
  • No I'm know of. Change partition location is fast. Of course it's not convenient to manage table with partitions in different locations. Insert overwrite is row by row slow by slow. There is no magic solution – leftjoin Nov 28 '16 at 14:17
  • Also you can delete and copy files. This is not atomic operation. – leftjoin Nov 28 '16 at 14:19

0 Answers0