Hive, a small query block join big table , why can't using map join?

Question

i have a question about hive mapjoin , i know when a small table join big table , using mapjoin is better, but when i got a sql like this

select a.col1,
       a.col2,
       a.col3, 
       /* there has many columns from table a, ignore..*/
       b.col4,
       b.col5,
       b.col6
  from a
 inner join b
    on (a.id = b.id)
 where b.date = '2018-02-10'
   and b.hour = '10';

Tips:
table b is big table , rows: 10000W+
table a is big table , rows: 10000W+
table b with predicate only return 1000 rows, i think this sql will using mapjoin , but execution plan is join in reduce side...

who can tell me why ??

The type of join (map or otherwise) has exactly nothing to do with how many rows are returned from your query. Can a given ID in your table A have multiple rows? — Andrew, Feb 20 '18 at 15:29

score 0 · Answer 1 · answered Feb 20 '18 at 14:38

I am not an expert in hive, but some times, the tools used as a client of SQL (i.e. MySQL Workbench) has implicit a limit 1000 in the settings. Try to specify a limit youself and force it to a higher value that 1000.

For example, check this image:

This is MySQL Workbench. The limit is automatically added to your query unless you specify a limit yourself.

leftjoin · Answer 2 · 2018-02-21T13:06:32.650

Try to move where clause into the subquery:

select a.col1,
       a.col2,
       a.col3, 
       /* there has many columns from table a, ignore..*/
       b.col4,
       b.col5,
       b.col6
  from a
 inner join (select * from b where b.date = '2018-02-10' and b.hour = '10' )b 
    on a.id = b.id
 ;

Also, intermediate filtered (temporary) table instead of subquery will work 100%, but this is not so efficient.

Also check these Hive configuration parameters:

set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

Join will be converted to map-join if small table does not exceed the size specified by hive.mapjoin.smalltable.filesize parameter.

Hive, a small query block join big table , why can't using map join?

2 Answers2