I am using HDP 2.6.2 and hive.
I have a situation where I am updating a partitioned table from a large table based on a column and the query just doesnt perform well and I cant understand why. The insert statement below is an example
insert into partitioned_table partition(dt_month)
select * from large_table
where incremental_string_col > (select last_incremental_col from temp_tab)
Here I assume, the subquery in the where clause is executed once and the result is cached or the entire temp_tab table which is basically just one row is shipped to all nodes by the CBO but it just doesnt seem to work as well as putting the string value in as a literal!
Can I explicitly state that a table needs to be cached in hive? Can I explicitly state that a query needs to be executed just once and the result cached? What am I missing here?
I understand the column in the where being a string is not the best case but I cant help that.
Any help would be much appreciated!!