123

What is the difference between an INNER JOIN and LEFT SEMI JOIN?

In the scenario below, why am I getting two different results?

The INNER JOIN result set is a lot larger. Can someone explain? I am trying to get the names within table_1 that only appear in table_2.

SELECT name
FROM table_1 a
    INNER JOIN table_2 b ON a.name=b.name

SELECT name
FROM table_1 a
    LEFT SEMI JOIN table_2 b ON (a.name=b.name)
Shiva
  • 20,575
  • 14
  • 82
  • 112
user3023355
  • 1,257
  • 2
  • 9
  • 6
  • 4
    The inner join will achieve your goal. I had never heard of a semi join until I saw this question. – Dan Bracuk Feb 12 '14 at 20:25
  • The `left semi join` should be returning more rows than the `inner join`. – Gordon Linoff Feb 12 '14 at 20:29
  • 2
    The `inner join` will return data only if there is a match between both tables. The `left join` will return data from the first table regardless if a matching record is found in the second table. – j03z Feb 12 '14 at 20:34
  • 19
    @GordonLinoff not necessarily, a `LEFT SEMI JOIN` will only return one row from the left, even if there are multiple matches in the right. An `INNER JOIN` will return multiple rows if there are multiple matching on the right. – D Stanley Feb 12 '14 at 20:48
  • Ask google and you shall receive http://sqlity.net/en/1348/a-join-a-day-the-left-semi-join/ – user2615302 Feb 12 '14 at 21:34
  • 2
    @j03z that can't be correct. If the purpose of the left hemi-join is 1) to return only the information in the left table (as others have said) and 2) to return rows from teh left table regardless of match (as I think you say) then that is just the original left table -- no join is necessary to accomplish that. I think others must be correct that the left hemi-join 1) only returns columns from the left table, 2) only returns rows that have a match in the right table, and 3) will return a single row from the left for one or more matches. – Carl G Sep 21 '15 at 20:20
  • it means select table1.* from table1 join table2 on table1.id=table2.id and select * from table1 left semi join table2 on table1.id=table2.id give the same result. is my understanding correct? – user2017 Dec 05 '17 at 17:58

6 Answers6

183

An INNER JOIN can return data from the columns from both tables, and can duplicate values of records on either side have more than one match. A LEFT SEMI JOIN can only return columns from the left-hand table, and yields one of each record from the left-hand table where there is one or more matches in the right-hand table (regardless of the number of matches). It's equivalent to (in standard SQL):

SELECT name
FROM table_1 a
WHERE EXISTS(
    SELECT * FROM table_2 b WHERE (a.name=b.name))

If there are multiple matching rows in the right-hand column, an INNER JOIN will return one row for each match on the right table, while a LEFT SEMI JOIN only returns the rows from the left table, regardless of the number of matching rows on the right side. That's why you're seeing a different number of rows in your result.

I am trying to get the names within table_1 that only appear in table_2.

Then a LEFT SEMI JOIN is the appropriate query to use.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Is there really such a thing as a `LEFT SEMI JOIN`? Isn't is just a `SEMI JOIN`? There's no sense to a `RIGHT SEMI JOIN`, is there? – ErikE Jul 18 '14 at 23:37
  • In [Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins), yes. – D Stanley Jul 20 '14 at 02:24
  • 2
    great answer just what i was looking for. i would phrase the answer more accurately :"...an INNER JOIN will return one row for each matching **row of the right table**, while a LEFT SEMI JOIN... – Barak1731475 Apr 20 '15 at 18:10
  • 7
    The opposite of this is a LEFT ANTI JOIN that filters out the data from the right table in the left table according to a key. Thought I'd leave this nugget here for someone who might be looking! – shantanusinghal Oct 29 '17 at 19:55
114

Suppose there are 2 tables TableA and TableB with only 2 columns (Id, Data) and following data:

TableA:

+----+---------+
| Id |  Data   |
+----+---------+
|  1 | DataA11 |
|  1 | DataA12 |
|  1 | DataA13 |
|  2 | DataA21 |
|  3 | DataA31 |
+----+---------+

TableB:

+----+---------+
| Id |  Data   |
+----+---------+
|  1 | DataB11 |
|  2 | DataB21 |
|  2 | DataB22 |
|  2 | DataB23 |
|  4 | DataB41 |
+----+---------+

Inner Join on column Id will return columns from both the tables and only the matching records:

.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
'----'---------'----'---------'

Left Join (or Left Outer join) on column Id will return columns from both the tables and matching records with records from left table (Null values from right table):

.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
:----+---------+----+---------:
|  3 | DataA31 |    |         |
'----'---------'----'---------'

Right Join (or Right Outer join) on column Id will return columns from both the tables and matching records with records from right table (Null values from left table):

┌────┬─────────┬────┬─────────┐
│ Id │  Data   │ Id │  Data   │
├────┼─────────┼────┼─────────┤
│  1 │ DataA11 │  1 │ DataB11 │
│  1 │ DataA12 │  1 │ DataB11 │
│  1 │ DataA13 │  1 │ DataB11 │
│  2 │ DataA21 │  2 │ DataB21 │
│  2 │ DataA21 │  2 │ DataB22 │
│  2 │ DataA21 │  2 │ DataB23 │
│    │         │  4 │ DataB41 │
└────┴─────────┴────┴─────────┘

Full Outer Join on column Id will return columns from both the tables and matching records with records from left table (Null values from right table) and records from right table (Null values from left table):

╔════╦═════════╦════╦═════════╗
║ Id ║  Data   ║ Id ║  Data   ║
╠════╬═════════╬════╬═════════╣
║  - ║         ║    ║         ║
║  1 ║ DataA11 ║  1 ║ DataB11 ║
║  1 ║ DataA12 ║  1 ║ DataB11 ║
║  1 ║ DataA13 ║  1 ║ DataB11 ║
║  2 ║ DataA21 ║  2 ║ DataB21 ║
║  2 ║ DataA21 ║  2 ║ DataB22 ║
║  2 ║ DataA21 ║  2 ║ DataB23 ║
║  3 ║ DataA31 ║    ║         ║
║    ║         ║  4 ║ DataB41 ║
╚════╩═════════╩════╩═════════╝

Left Semi Join on column Id will return columns only from left table and matching records only from left table:

┌────┬─────────┐
│ Id │  Data   │
├────┼─────────┤
│  1 │ DataA11 │
│  1 │ DataA12 │
│  1 │ DataA13 │
│  2 │ DataA21 │
└────┴─────────┘
Amit Naidu
  • 2,494
  • 2
  • 24
  • 32
Abhishek Bansal
  • 1,481
  • 1
  • 8
  • 8
33

Tried in Hive and got the below output

table1

1,wqe,chennai,india

2,stu,salem,india

3,mia,bangalore,india

4,yepie,newyork,USA

table2

1,wqe,chennai,india

2,stu,salem,india

3,mia,bangalore,india

5,chapie,Los angels,USA

Inner Join

SELECT * FROM table1 INNER JOIN table2 ON (table1.id = table2.id);

1 wqe chennai india 1 wqe chennai india

2 stu salem india 2 stu salem india

3 mia bangalore india 3 mia bangalore india

Left Join

SELECT * FROM table1 LEFT JOIN table2 ON (table1.id = table2.id);

1 wqe chennai india 1 wqe chennai india

2 stu salem india 2 stu salem india

3 mia bangalore india 3 mia bangalore india

4 yepie newyork USA NULL NULL NULL NULL

Left Semi Join

SELECT * FROM table1 LEFT SEMI JOIN table2 ON (table1.id = table2.id);

1 wqe chennai india

2 stu salem india

3 mia bangalore india

note: Only records in left table are displayed whereas for Left Join both the table records displayed

Community
  • 1
  • 1
Kumar
  • 918
  • 9
  • 19
  • 1
    You should really put some duplicate ids in the RHS table to see the full difference though - e.g. if you have two values with id 1 in table 2, you will get 2 rows in the result for them with INNER JOIN and only one with LEFT SEMI JOIN. That shows that you can get a different number of rows returned, as well as only getting columns from the LHS table with the LSJ – mc110 Oct 20 '22 at 11:21
4

All above answers are correct. However in practice, it helps to associate the mental model of a filter when imagining LEFT SEMI JOIN.

The answer is a subset of rows from LEFT table, which have a match in RIGHT TABLE.

dsculptor
  • 341
  • 2
  • 8
1

Semi Joining Left with Right give you the rows that would have been kept in Left if you would join with Right

Left table: 
KEY:  1, 2, 3
VAL1: a, b, c

Right table: 
KEY:  2, 3, 4
VAL2: d, e, f

# Semi join: 
KEY:  2, 3
VAL1: b, c

# Actual inner join
KEY:  2, 3
VAL1: b, c
VAL2: d, e 
User12547645
  • 6,955
  • 3
  • 38
  • 69
0

They are exactly the same unless two things happen:

  1. There is/are duplicate records in the resulting table

In this case inner join brings in duplicates but left semi won't. So distinct of all rows in inner join = left semi

  1. you want access to right table field names

In inner join we can get both left and right table in the result. But with left semi you only get left table. Why is that? Check this query:

SELECT name
FROM table_left left
WHERE EXISTS(
    SELECT * FROM table_right right WHERE (left.name=right.name))

In this query we cannot get access to fields in the inner table (table_right). And this is how left semi is implemented behind the scenes. It is an 'is in' operation.

If these two conditions wont't happen in your inner join then they are exactly the same. So it depends how you test it. And hence the confusion. A one line answer to the question would be

One of them is not a join

Blue Clouds
  • 7,295
  • 4
  • 71
  • 112