-1

Does "RIGHT OUTER JOIN" and "LEFT OUTER JOIN" perform differently in HIVE?

For example, Table A is a small table. Table B is much larger and partitioned on col2.

Query 1: SELECT * FROM A RIGHT OUTER JOIN B on (A.col=B.col) WHERE B.col2>20131001

Query 2: SELECT * FROM A LEFT OUTER JOIN B on (A.col=B.col) WHERE B.col2>20131001

Query 1 only scans a small number of partitions, but Query 2 scans all partitions of Table B. Is it a bug or I miss something?

Thanks!

user1751221
  • 169
  • 1
  • 2
  • 5
  • You mean partitioning is by the `col2` (time) and the `B.col2>20131001` filter does not prune partitions on second case? Can you post the EXPLAIN plans for the two? – Remus Rusanu Oct 25 '13 at 06:59
  • You should also read [MapJoin & Partition Pruning](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=34015666) and see if your case fits the problem description. – Remus Rusanu Oct 25 '13 at 08:14
  • Possible duplicate of [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – philipxy Nov 11 '18 at 00:07

2 Answers2

0

I experienced the same (don't know why this, bug or feature. i would consider this definately as a bug). Since left outer join is heavily used by me, i use the following workaround:

SELECT *
FROM A
LEFT OUTER JOIN
  (SELECT *
   FROM B
   WHERE B.COL2 >20131001) ON (A.COL=B.COL)

or

SELECT *
FROM A
LEFT OUTER JOIN B ON (A.COL= B.COL
                      AND B.COL2 > 20131001)

I'm working on Hive 0.11 ... would be interesting if this is fixed in later versions.

Wolli
  • 748
  • 1
  • 8
  • 12
0

Query 1 table B is preserved table so B's where predicate is pushdown.

Query 2 table B is not preserved table so B's where predicate will not pushdown.hive will scan the entire table B

see this for more detail OuterJoinBehavior