Usually, when performing a left join on two tables, all rows of the left table (A) are kept, and the right table (B) is only joined for matching join conditions, e.g. A.ID = B.ID. This works fine with small test data sets.
However, when I try to join set A with a considerably larger set B (i.e., the number of unique IDs in B is about 100 times the number of unique IDs in A) the result dataset includes only those rows of A that have matching IDs with B, which -- in my understanding -- is a normal (inner) join.
I get the desired result table by left joining set A with only those rows of set B that have matching IDs with set A, but I do not understand why the simple left join does not yield the same result.
Unfortunately, I cannot replicate the result with test data.
In general, are there possible reasons for a truncated left table after a left join?
EDIT:
set A:
ID name
X1 AB
X2 XY
X3 VT
X4 ZY
X5 YZ
X6 KJ
X7 HA
X8 BK
X9 LM
set B:
ID Var1
X1 blue
X11 red
X3 yellow
X4 blue
X12 yellow
X6 red
X7 orange
X7 blue
X8 green
X9 green
X10 blue
This gives a truncated set A:
select A.*, B.Var1 from
setA A
left join setB B
on A.ID = B.ID
where B.Var1 = 'blue';
This gives what I want:
select A.*, B.Var1 from
setA A
left join (select * from setB where Var1 = 'blue') B
on A.ID = B.ID;
I now understand that where
placed after the join
filters the join result, and that I need to see join
and where
as two separate tasks (correct?).
However, it does not seem that natural to me (as a non-expert) that where B.Var1 = 'blue'
filters the join result although it says B.Var1
and not only Var1
, which I could understand more easily to refer to the join result. The B.
suggests (to me) somehow to affect the left table used in the join.