0

I need to use NOT IN query in Hive.

I have 3 tables A, B and C.

B with fields PRODUCT, ID and VALUE. C with fields ID and VALUE.

I need to write the rows from table B, which has no matching ID and VALUE fields in table C, to table A.

INSERT OVERWRITE TABLE A a SELECT * FROM B b LEFT SEMI JOIN C c ON (b.ID = c.ID AND b.VALUE = c.VALUE) where b.ID = NULL AND b.VALUE = NULL;

This suggestion from http://stackoverflow.com/questions/25041026/hive-left-semi-join-for-not-exists is not working, as I referred the right side table in WHERE clause, which should not be done.

How to form the equivalent query without referrring the right side table in the WHERE clause.

Any other solution?

Harbinger
  • 762
  • 2
  • 14
  • 36

2 Answers2

0

Solution:

Check the target tables have all the fields from both the tables. Because, here used *.

Then, It should be b.VALUE IS NULL and not = NULL.

The query should be like this:

INSERT OVERWRITE TABLE A a 
SELECT * FROM B b 
LEFT SEMI JOIN C c 
ON (b.ID = c.ID AND b.VALUE = c.VALUE) where 
b.ID IS NULL AND b.VALUE IS NULL;
Harbinger
  • 762
  • 2
  • 14
  • 36
0

Hive seems to support IN, NOT IN, EXIST and NOT EXISTS from 0.13

Select A.Id,A.*
From A
Where EXISTS (Select 1 From B where A.ID = B.ID)

The subqueries in EXIST and NOT EXISTS should have correlated predicates (like b.ID = a.ID in above sample) For more, refer Hive Wiki > Subqueries in the WHERE Clause

user2125117
  • 749
  • 6
  • 5