1

I am joining tbl_A to tbl_B, on column CustomerID in tbl_A to column Output in tbl_B which contains customer ID. However, tbl_B has all other information in related rows that I do not want to lose when joining. I tried to join using like, but I lost rows that did not contain customer ID in the output column.

enter image description hereenter image description here

Here is my join query in Hive:

select a.*, b.Output from tbl_A a
left join tbl_B b
On b.Output like concat('%', a.CustomerID, '%')

However, I lose other rows from output.

David Buck
  • 3,752
  • 35
  • 31
  • 35
Julie
  • 55
  • 1
  • 6
  • 2
    Interesting. You claim that this code was executed without an error? – David דודו Markovitz Jun 01 '17 at 20:29
  • Please read and act on [mcve]. PS If you want all rows from b, you need b left join a, or a right join b. – philipxy Jun 05 '17 at 10:07
  • 1
    If you are able to run this without an error could you share your Hive version please? As far as I know, Hive only allows equi-joins; in other words, it does not allow join phrase to be in form `ON b.Output like concat('%', a.CustomerID, '%')`. see this: https://stackoverflow.com/a/25850819/1434041 – Zahra Aug 16 '17 at 17:27

2 Answers2

1

You could also achieve the objective by a simple hive query like this :)

select a.*, b.Output 
from tbl_A a, tbl_B b
where b.Output like concat('%', a.CustomerID, '%')
D. Pardal
  • 6,173
  • 1
  • 17
  • 37
Vaibhav
  • 11
  • 1
0

I would suggest first extract all ID's from free floating field which in your case is 'Output' column in table B into a separate table. Then join this table with ID's to Table B again to populate in each row the ID and then this second joined table which is table B with ID's to table A.

Hope this helps.

Lilu
  • 108
  • 8