i have a question about hive mapjoin , i know when a small table join big table , using mapjoin is better, but when i got a sql like this
select a.col1,
a.col2,
a.col3,
/* there has many columns from table a, ignore..*/
b.col4,
b.col5,
b.col6
from a
inner join b
on (a.id = b.id)
where b.date = '2018-02-10'
and b.hour = '10';
Tips:
table b is big table , rows: 10000W+
table a is big table , rows: 10000W+
table b with predicate only return 1000 rows,
i think this sql will using mapjoin , but execution plan is join in reduce side...
who can tell me why ??