1

I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records)

SELECT st.id
FROM small_table st
LEFT JOIN large_table lt
        ON st.id = lt.id

In the above scenario, I am not able to control which table has to be cached into memory.I have tried using MAPJOIN , STREAMTABLE hints and also tried with parameters like conditional task size , small tbale size etc. Since the small table is on the left most side of the join, its not being cached into memory

Is there way to control the table that needs to be cached

Note: I cannot alter the table positions or the code:
Neither here...

enter image description here

...nor here...

enter image description here

Parameters used:

set hive.execution.engine=tez; 
set hive.tez.container.size=4096; 
set hive.merge.mapredfiles=true; 
set tez.shuffle-vertex-manager.min-src-fraction=0.25; 
set tez.shuffle-vertex-manager.max-src-fraction=0.75; 
set hive.exec.dynamic.partition.mode=nonstrict; 
set tez.am.resource.memory.mb=3200 ; 
set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; 
SET hive.auto.convert.join=true; 
set hive.auto.convert.join.noconditionaltask.size=288435456;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
PK25
  • 11
  • 2
  • If I change the postiions of the table ,m then a map join is enabled SELECT st.id FROM large_table lt LEFT JOIN small_table st ON st.id = lt.id – PK25 Mar 22 '19 at 23:16
  • Please add also parameters used – leftjoin Mar 23 '19 at 08:03
  • set hive.execution.engine=tez; set hive.tez.container.size=4096; set hive.merge.mapredfiles=true; set tez.shuffle-vertex-manager.min-src-fraction=0.25; set tez.shuffle-vertex-manager.max-src-fraction=0.75; set hive.exec.dynamic.partition.mode=nonstrict; set tez.am.resource.memory.mb=3200 ; set tez.am.java.opts=-server -Xmx3200m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC -XX:+UseConcMarkSweepGC ; SET hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=288435456; – PK25 Mar 24 '19 at 02:38

0 Answers0