0

I am using HDP 2.6.2 and hive.

I have a situation where I am updating a partitioned table from a large table based on a column and the query just doesnt perform well and I cant understand why. The insert statement below is an example

insert into partitioned_table partition(dt_month) select * from large_table where incremental_string_col > (select last_incremental_col from temp_tab)

Here I assume, the subquery in the where clause is executed once and the result is cached or the entire temp_tab table which is basically just one row is shipped to all nodes by the CBO but it just doesnt seem to work as well as putting the string value in as a literal!

Can I explicitly state that a table needs to be cached in hive? Can I explicitly state that a query needs to be executed just once and the result cached? What am I missing here?

I understand the column in the where being a string is not the best case but I cant help that.

Any help would be much appreciated!!

1 Answers1

0

You can use cross map join with single row subquery then filter rows by inequality condition:

select * 
  from large_table l
       cross join (single_row_subquery) s
 where l.incremental_string_col>s.last_incremental_col;

Or calculate the subquery in a separated script and pass as hivevar variable like here: https://stackoverflow.com/a/37821218/2700344

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • The map join with single row subquery also performs poorly sadly. I think I would have to go for the other option sadly.. But is there a more graceful way of doing this? Inside the same hive session maybe? – Subramaniam Ramasubramanian Nov 09 '17 at 13:05
  • If mapjoin works it should performe almost the same as without join – leftjoin Nov 09 '17 at 13:59
  • How can I ensure that mapjoin is working? Is there any way for me to see what optimizations have been applied? I tried to explicitly state this on the select as a hint as well but the performance was very poor. – Subramaniam Ramasubramanian Nov 14 '17 at 16:23
  • @SubramaniamRamasubramanian Study the plan, execute `explain select...` See this: https://stackoverflow.com/a/42972030/2700344. Check configuration responsible for mapjoin: https://community.hortonworks.com/content/supportkb/49639/hiveautoconvertjoin-true.html – leftjoin Nov 14 '17 at 18:44