Possible Duplicate:
SQL Query JOIN with Table
CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1 (This is the MAIN table through which comparisons need to be made)
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)
And this is the data in the above first table
**BUYER_ID** | **ITEM_ID** | **CREATED_TIME**
--------------+------------------+-------------------------
1015826235 220003038067 *2001-11-03 19:40:21*
1015826235 300003861266 2001-11-08 18:19:59
1015826235 140002997245 2003-08-22 09:23:17
1015826235 *210002448035* 2001-11-11 22:21:11
This is Second table in Hive- It also contains information about the items we are purchasing.
CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)
And this is the data in the above second table (TestingTable2
)-
**USER_ID** **PURCHASED_ITEM**
1015826235 [{"product_id":220003038067,"timestamps":"1004941621"}, {"product_id":300003861266,"timestamps":"1005268799"}, {"product_id":140002997245,"timestamps":"1061569397"},{"product_id":200002448035,"timestamps":"1005542471"}]
Compare TestingTable2
with TestingTable1
so that below scenario is fulfilled.
Find the PRODUCT_ID
AND TIMESTAMPS
from TestingTable2
WHICH IS NOT MATCHING WITH ITEM_ID
AND CREATED_TIME
from TestingTable1 CORRESPONDING TO BUYER_ID(USER_ID)
after comparing from TestingTable1
.
So If you look TestingTable2
data this(last) ITEM_ID 210002448035
from TestingTable1
is not matching with TestingTable2
PRODUCT_ID- 200002448035
data and similarly with timestamps. So I want to show the below result using the HiveQL query.
**BUYER_ID** | **ITEM_ID** | **CREATED_TIME** | **PRODUCT_ID** | **TIMESTAMPS**
--------------+------------------+--------------------------------+------------------------+----------------------
1015826235 *210002448035* 2001-11-11 22:21:11 200002448035 1005542471
1015826235 220003038067 *2001-11-03 19:40:21* 220003038067 1004941621
Can anyone help me with this. As I am new to HiveQL so having lot of Problem.
Updated:-
I have written this query, but it doesn't working the way I wanted to.
select * from
(select * from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps
from testingtable2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts)
prod_and_ts
LEFT OUTER JOIN testingtable1
ON ( prod_and_ts.user_id = testingtable1.buyer_id AND testingtable1.item_id = prod_and_ts.product_id
AND prod_and_ts.timestamps = UNIX_TIMESTAMP (testingtable1.created_time)
)
where testingtable1.buyer_id IS NULL)
set_a LEFT OUTER JOIN testingtable1
ON (set_a.user_id = testingtable1.buyer_id AND
( set_a.product_id = testingtable1.item_id OR set_a.timestamps = UNIX_TIMESTAMP(testingtable1.created_time) )
);
One More UPDATE
As per user1166147
comments. I wrote my query as per his query.
In hive I guess INNER JOIN
are written by simply JOIN
.
This is my below query.
select * from (select t2.buyer_id, t2.item_id, t2.created_time as created_time, subq.user_id, subq.product_id, subq.timestamps as timestamps
from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id
AND subq.timestamps = unix_timestamp(t2.created_time)
WHERE (subq.product_id <> t2.item_id)
union all
select t2.buyer_id, t2.item_id as item_id, t2.created_time, subq.user_id, subq.product_id as product_id, subq.timestamps
from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id
and subq.product_id = t2.item_id
WHERE (subq.timestamps <> unix_timestamp(t2.created_time))) unionall;
And after running the above query, I am getting zero result back.
One Final UPDATE:-
My Bad, I didn't have the accurate data in the tables so that is the reason I was not getting the result back. Yes it is working the actual above query.