2

This is Table1 data.

USER_ID     |    PRODUCT_ID    |   TIMESTAMPS
------------+------------------+-------------
1015826235       220003038067     *1004941621*
1015826235       300003861266      1005268799
1015826235       140002997245      1061569397
1015826235      *210002448035*     1005542471

If you compare the Table1 data with the below Table2 data, then the PRODUCT_ID in the last line of Table1 data is not matching with the ITEM_ID in the last line in the below Table2 data and also same with TIMESTAMPS in the first line of Table1 data is not matching with CREATED_TIME in the first line of Table2 data.

BUYER_ID     |    ITEM_ID         |   CREATED_TIME 
-------------+--------------------+------------------------
1015826235       220003038067       *2001-11-03 19:40:21*
1015826235       300003861266        2001-11-08 18:19:59
1015826235       140002997245        2003-08-22 09:23:17
1015826235      *200002448035*       2001-11-11 22:21:11

So I need to show the result like this for the above example after JOINING Table1 with Table2- Either of the below output is fine for me.

BUYER_ID    |     ITEM_ID       |    CREATED_TIME         |    PRODUCT_ID     |       TIMESTAMPS
------------+-------------------+-------------------------+-------------------+-----------------
1015826235       220003038067       *2001-11-03 19:40:21*      220003038067       *1004941621*
1015826235      *200002448035*       2001-11-11 22:21:11      *210002448035*       1005542471

OR

BUYER_ID   |     ITEM_ID       |    CREATED_TIME         |     USER_ID      |       PRODUCT_ID     |   TIMESTAMPS
-----------+-------------------+-------------------------+------------------+----------------------+------------------
1015826235       220003038067       *2001-11-03 19:40:21*      1015826235           220003038067       *1004941621*
1015826235      *200002448035*       2001-11-11 22:21:11       1015826235          *210002448035*       1005542471

Any help will be appreciated.

UPDATE:-

select * from (select * from (select user_id, prod_and_ts.product_id as 
product_id, prod_and_ts.timestamps as timestamps from testingtable2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts LEFT OUTER 
JOIN table2 ON ( prod_and_ts.user_id = table2.buyer_id AND table2.item_id =
prod_and_ts.product_id AND prod_and_ts.timestamps = UNIX_TIMESTAMP
(table2.created_time) ) where table2.buyer_id IS NULL) set_a LEFT OUTER JOIN
table2 ON (set_a.user_id = table2.buyer_id AND  ( set_a.product_id = 
table2.item_id OR set_a.timestamps = UNIX_TIMESTAMP(table2.created_time) ));
arsenal
  • 23,366
  • 85
  • 225
  • 331
  • Didn't you already ask this question...? [SQL Query JOIN with Table](http://stackoverflow.com/questions/11386368/sql-query-join-with-table) – Brandon Boone Jul 09 '12 at 00:11
  • I asked it but in that I am posting Table1 as a seperate SQL query but here I have simplified it more by making Table1 and Table2 as separate so that people don't get confused. – arsenal Jul 09 '12 at 00:13
  • Then I would delete your previous question or you'll risk this question being closed as exact duplicate. – Brandon Boone Jul 09 '12 at 00:15
  • You can delete my previous question if you want. And keep this as a new as this is more specific and cleaner to understand what I want. – arsenal Jul 09 '12 at 00:16

2 Answers2

1

I believe the following should return a table like the second one you listed.

SELECT T2.BUYER_ID, T2.ITEM_ID, '*'+T2.CREATED_TIME+'*' as CREATED_TIME, T1.USER_ID, T1.PRODUCT_ID, '*'+T1.TIMESTAMPS+'*' as TIMESTAMPS
FROM Table1 T1
    INNER JOIN Table2 T2 ON T1.USER_ID = T2.BUYER_ID
    AND T1.PRODUCT_ID = T2.ITEM_ID
    AND T1.TIMESTAMPS <> T2.CREATED_TIME
UNION
SELECT T2.BUYER_ID, '*'+T2.ITEM_ID+'*' as ITEM_ID, T2.CREATED_TIME, T1.USER_ID, '*'+T1.PRODUCT_ID+'*' as PRODUCT_ID, T1.TIMESTAMPS
FROM Table1 T1
    INNER JOIN Table2 T2 ON T1.USER_ID = T2.BUYER_ID
    AND T1.TIMESTAMPS = T2.CREATED_TIME
    AND T1.PRODUCT_ID <> T2.ITEM_ID

The first statement gets all the rows where the BUYER_ID and USER_ID match, the PRODUCT_ID and ITEM_ID match, but the TIMESTAMPS and CREATED_TIME do NOT match.

The second gets all the rows where the BUYER_ID and USER_ID match, the TIMESTAMPS and CREATED_TIME match, but the PRODUCT_ID and ITEM_ID do NOT match.

This, of course, will could messy if someone buys the same product more than once, or if someone buys 2 different products at the exact same time.

  • Thanks Tee for commenting out, It is not possible to use LEFT or RIGHT join here in my case? as I am working with Hive and it supports sql like syntax but currently Hive does not support INNER JOIN, so I cannot use the above query. But I can use any other join apart from INNER JOIN. – arsenal Jul 09 '12 at 06:35
0

Seems to me that you want to LEFT JOIN the two tables on USER_ID = BUYER_ID, PRODUCT_ID = ITEM_ID, TIME1 = TIME2.

Some lines will match on all three fields and yield non-null values in all three columns. The others you want to report.

Then you want to run the same query with the tables in reverse order.

Thus you will have all nonmatching rows in the first table, and nonmatching rows in the second table.

SELECT * from table1 LEFT JOIN table2 ON ( user_id = buyer_id AND item_id = product_id AND ts1 = ts2)
       WHERE buyer_id IS NULL
UNION
select * from table1 RIGHT JOIN table2 ON ( user_id = buyer_id AND item_id = product_id AND ts1 = ts2)
       WHERE product_id IS NULL;

I don't think there is a way of reconciling the two; you're not even certain, in general, to have the same number of rows in the two queries. Maybe you can run a second query between the two nonmatched sets, noticing the lines sharing userid and productid, or userid and timestamp. But you should also track lines sharing neither?

For example, get nonmatching set 1 and try to match it with table2 based on product id or timestamp (both cannot match, or the first JOIN would have matched, and buyer_id would not have been NULL)

 SELECT * FROM
     ( SELECT table1.user_id, table1.product_id, table1.ts1 FROM table1 LEFT JOIN table2 
          ON ( user_id = buyer_id AND item_id = product_id AND ts1 = ts2) WHERE buyer_id IS NULL ) AS set_a
     LEFT JOIN table2
     ON ( set_a.user_id = table2.buyer_id AND 
        ( set_a.product_id = table2.item_id OR set_a.ts1 = table2.ts2 ));

The same running in reverse against table1 would fetch partial matches the other way.

If the rows are always the same in number and matched by userid, the query above will give the expected results. Except that I defined ts1 and ts2, in my example, as two timestamps; all comparisons between a timestamp and a date/time shall require the proper conversion function, e.g.

  ts1 = ts2

might become

  DATETIME(ts1) = ts2

or

  ts1 = UNIX_TIMESTAMP(ts2)

or the like, depending on the actual definition of ts1 and ts2 and the platform (e.g., MySQL vs SQL Server vs PostgreSQL...).

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • Thanks Iserni for commenting out. I am not sure I understand what you just said now. The examples that you gave me will provide me the actual output I am looking for? And which query I should use from your comment to get my actual output. As I got confused a lot. – arsenal Jul 09 '12 at 01:11
  • I tried your above query but it didn't gave the output I was expecting. :( – arsenal Jul 09 '12 at 03:31
  • The second query should, except that you will need to tune the timestamp comparison (depends on how dates and time work in your platform: is it MySQL, SQL Server, or other? – LSerni Jul 09 '12 at 05:58
  • Thanks lserni for commenting out, let me make it more clear, here Table1 is not actually a Table1 all the Tble1 data I am getting it from this query `select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts` So as per my question I represented the output from the above query as Table1 data. So as per your second query, I represented my query as like this below and I have taken care of timestamp conversion. I updated my question with the query I am using. – arsenal Jul 09 '12 at 06:15
  • I am unclear on this alias syntax: seems to me that inside parentheses you should use, say, "exploded" instead of "", not to confuse with the *prod_and_ts* which is outside. Of course they're different scopes, so maybe that's OK: "SELECT user_id, .product_id AS product_id, ... FROM testingtable2 LATERAL VIEW ... AS ) *prod_and_ts*". Other than that, it should work (I'm not at all familiar with Oracle)... – LSerni Jul 09 '12 at 06:44
  • Apart from table alias query that I wrote is matching with your second query right? As I have converted Table1 to a particular query. So according to your the query I wrote looks good if we compared with your second query? – arsenal Jul 09 '12 at 07:00
  • Yes. Again within the narrow limits of my little oracle-fu :-) – LSerni Jul 09 '12 at 07:05
  • I think I forgot to tell you one more thing, I am not working with Oracle, I am working with Hive and Hive supports SQL like syntax. But the limitation with Hive is that it cannot use `OR` in JOIN query, so if you look at your last line in second query, you have `OR` in this line basically. `( set_a.product_id = table2.item_id OR set_a.ts1 = table2.ts2 ));`. So is there any way to get rid of this `OR` clause and we can do some other way, then this query will work like charm in Hive environment. Any suggestions? – arsenal Jul 09 '12 at 07:09
  • One more update on this, I tried doing this way, but I am getting zero records back. – arsenal Jul 09 '12 at 17:28
  • Only way I can think of is to run query twice and do an UNION (either manually or programmatically) of the two resultsets. The two ORs are mutually exclusive, so there should be no overlap. – LSerni Jul 09 '12 at 17:49