0

Usually, when performing a left join on two tables, all rows of the left table (A) are kept, and the right table (B) is only joined for matching join conditions, e.g. A.ID = B.ID. This works fine with small test data sets.

However, when I try to join set A with a considerably larger set B (i.e., the number of unique IDs in B is about 100 times the number of unique IDs in A) the result dataset includes only those rows of A that have matching IDs with B, which -- in my understanding -- is a normal (inner) join.

I get the desired result table by left joining set A with only those rows of set B that have matching IDs with set A, but I do not understand why the simple left join does not yield the same result.

Unfortunately, I cannot replicate the result with test data.

In general, are there possible reasons for a truncated left table after a left join?

EDIT:

set A:

ID  name
X1  AB
X2  XY
X3  VT
X4  ZY
X5  YZ
X6  KJ
X7  HA
X8  BK
X9  LM

set B:

ID  Var1
X1  blue
X11 red
X3  yellow
X4  blue
X12 yellow
X6  red
X7  orange
X7  blue
X8  green
X9  green
X10 blue

This gives a truncated set A:

select A.*, B.Var1 from
setA A 
left join setB B
on A.ID = B.ID
where B.Var1 = 'blue';

This gives what I want:

select A.*, B.Var1 from
setA A 
left join (select * from setB where Var1 = 'blue') B
on A.ID = B.ID;

I now understand that where placed after the join filters the join result, and that I need to see join and where as two separate tasks (correct?). However, it does not seem that natural to me (as a non-expert) that where B.Var1 = 'blue' filters the join result although it says B.Var1 and not only Var1, which I could understand more easily to refer to the join result. The B. suggests (to me) somehow to affect the left table used in the join.

Aki
  • 409
  • 2
  • 6
  • 15

2 Answers2

1

TL/DR Use setA A left join setB B on A.ID = B.ID and B.Var1 = 'blue'.

all rows of the left table (A) are kept, and the right table (B) is only joined for matching join conditions, e.g. A.ID = B.ID"

It's not clear what you are trying to say by that.

left join on is defined to return inner join on rows plus unmatched left table rows extended by nulls. If not all rows of A are in the result then you added a where (or an inner join on or right join on) after the left join on. Whenever you left join on, have clear in your mind what associated inner join on you want; that's what determines the on.

Does this mean that a where clause is not allowed in [outer] joins?

That (comment) is also strange because where is never "in" a (outer or inner) join, it is always after any joins. (You can think of inner join as being like cross join and on as being like where but with higher precedence.)

This gives what I want:

select A.*, B.Var1 from
setA A 
left join (select * from setB where Var1 = 'blue') B
on A.ID = B.ID;

So does this, the "natural" phrasing:

select A.*, B.Var1 from
setA A
left join setB B
on A.ID = B.ID
and B.Var1 = 'blue';

However, it does not seem that natural to me (as a non-expert) that

inner join on is defined to return cross join rows that meet the (entire) on condition. Another way to describe this is that for every possible pair of a row from each input a combined row is made but only if it satisfies the (entire) on condition is it output. Another way to describe this is that for every possible pair of a row from each input only if they satisfy the (entire) on condition are they combined into an output row.

Some people seem to use one of the latter two interpretations but have misconceptions. They think that inner join on outputs something different than a cross join followed by filtering by the on. Or they think that a join can only be on a condition involving (whatever they mean by that) columns from both inputs--as if each must be mentioned in order to be available to be combined and possibly output. But no--the on condition can be anything and is just evaluated for each possible input row pairing.

CROSS JOIN vs INNER JOIN: CROSS JOIN = (INNER) JOIN = comma (",")
What is the difference between “INNER JOIN” and “OUTER JOIN”?
(Beware of the nonsense and poor writing in so many other answers there.)

where B.Var1 = 'blue' filters the join result although it says B.Var1 and not only Var1, which I could understand more easily to refer to the join result. The B. suggests (to me) somehow to affect the left table used in the join.

This is more unclear & strange phrasing. It seems like you have misconceptions that you need to let go of. It would probably help if you tried to force yourself to clearly express the quoted statements and also how you have expected joins with on to work. (A necessary condition for becoming/being an expert is to not tolerate sloppiness in reading, thinking or writing.)

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • First of all, there was a typo in my comment (saying inner join while I meant outer join) and Alex pointed that out already. I don't think it is necessary to emphasize this in your answer. Regarding my question: I tried to explain my problem in words, rather than code, and I know that this often leads to misunderstandings. Sorry for that. However, I think I understand what a left join is, I am just not used to working with SQL. I also understand the importance of an MWE, but it can sometimes be diffficult to provide one. Thanks to @GordonLinoff the problem could be easily solved. – Aki Nov 23 '17 at 06:58
  • Hi. I didn't quote you re WHERE because of the INNER, by which I know you meant OUTER. (It's odd you mention it now since WHERE isn't "in" either kind of join.) (Indeed it's odd you talked about "in" after you thought you were unconfused.) I quoted that comment because, as I said, your expected output & "WHERE in a join" indicate fundamental misunderstandings. I wrote an answer that I felt appropriately addressed the misconceptions involved. PS Comments are ephemeral & are for requesting clarifications, questions & answers are permanent. – philipxy Nov 23 '17 at 07:27
  • Thanks, now I see what you mean. I understand that it must be hard for experts to deal with poorly formulated questions. You are right, I really didn't see that the 'where'-clause has nothing to do with the join itself, but alters the result "afterwards". And no, I am not using the deprecated notation. I will try to insert an MWE, but actually my question has been asked several times before. – Aki Nov 23 '17 at 07:33
  • The reason why I answered your question rather than linking as a duplicate to a generic exposition re select or left join vs right join or join vs where was that you seemed to have two specific misunderstandings that I thought made this question a bit different. – philipxy Nov 23 '17 at 07:53
0

Your query is technically fine conceptually. You simply forgot the (+) operator. It should be

select A.*, B.Var1 from
setA A 
left join setB B
on A.ID = B.ID
where B.Var1(+) = 'blue';  -- Notice the magic (+) symbol

The (+) symbol comes from the old style Oracle join syntax and might be pronounced as "Where B.Var1, if it exists, is equal to blue." Otherwise it'll only allow rows where the variable has a value that matches, which mean that it has to exist.

The more modern way to write it would be:

select    A.*, 
          B.Var1 
from      setA A 
left join setB B  on  A.ID   = B.ID
                  AND B.Var1 = 'blue';  

In other words, take it out of the where and keep it as part of the on condition by continuing it with and.

John Wu
  • 50,556
  • 8
  • 44
  • 80
  • @Aki My answer already gives this correct `inner join on` query. (I called it the ' "natural" ' one.) Re `(+)`: Even [Oracle itself deprecates that (very limited) notation](https://docs.oracle.com/cd/B28359_01/server.111/b28286/queries006.htm#SQLRF52335). Also It is used wrongly here: It is for joining via `,` for use in `where` ("Oracle recommends" bullet 2) without `join` (bullet 1) and must appear on all `B.` columns (bullets 3 & 4). (But--no [mcve].) It's irrelevant here--`on` was introduced for complete `outer join` generality. You just weren't using `on` correctly. – philipxy Nov 24 '17 at 22:39
  • @Aki Re "Your query is technically fine conceptually." On the contrary, whatever that is supposed to mean, see my answer re *mis*conceptions in the question. (See also "beware".) – philipxy Nov 24 '17 at 22:51
  • Well I certainly wouldn't recommend using the first example for your system. It's meant to help the OP follow a line of reasoning. The second example is the right way to write it. – John Wu Nov 24 '17 at 23:18
  • @philpxy It's hardly nonsense. The (+) is for a where clause where the row may not exist on the other side of the join, which is the exact problem OP is struggling with conceptually. But I do understand my answer does not help *you*, as that was not my goal. – John Wu Nov 26 '17 at 05:18
  • @JohnWu (cc @Aki) Read my first comment on this answer. It explains that your code using `(+)` is invalid. – philipxy Nov 26 '17 at 22:06