Questions tagged [mapjoin]

For questions about join optimizing in Apache Hive when one the join tables is small enough to fit into the Mapper's memory, then there is no need to launch the Reducer and the Join can be done on Mapper.

Read more: Apache Hive Map-Join Wiki

5 questions
9
votes
1 answer

Hive Map-Join configuration mystery

Could someone clearly explain what is the difference between hive.auto.convert.join and hive.auto.convert.join.noconditionaltask configuration parameters? Also these corresponding size parameters: hive.mapjoin.smalltable.filesize and…
leftjoin
  • 36,950
  • 8
  • 57
  • 116
4
votes
1 answer

bucketing in non equi join in hive

Currently hive does support non equi join. But as the cross product becomes pretty huge, I was wondering what are the options to tackle a large fact(257 billion rows, 37 tb) and relatively smaller(8.7 gb) dimension table join. In case of equi join I…
user3123372
  • 704
  • 1
  • 10
  • 26
1
vote
0 answers

How to cache the left most table in memory for a left outer join in hive

I have a large table (1Tb of data) that needs to be joined with a smaller table (100k records) SELECT st.id FROM small_table st LEFT JOIN large_table lt ON st.id = lt.id In the above scenario, I am not able to control which table has to be…
PK25
  • 11
  • 2
1
vote
2 answers

Hive, a small query block join big table , why can't using map join?

i have a question about hive mapjoin , i know when a small table join big table , using mapjoin is better, but when i got a sql like this select a.col1, a.col2, a.col3, /* there has many columns from table a, ignore..*/ …
DMW
  • 11
  • 2
0
votes
1 answer

How to force enable broadcast join in Spark

I have a spark SQL query that goes like this - SELECT /*+ BROADCASTJOIN (sbg_published.sk_e2e_web_all_vis) */ a.* FROM sbg_published.sk_e2e_web_all_vis a LEFT JOIN sbg_published.web_funnel_detail_v4 b ON a.col1 =…
Sankar
  • 546
  • 4
  • 15