13

Spark SQL documentation specifies that join() supports the following join types:

Must be one of: inner, cross, outer, full, full_outer, left, left_outer, right, right_outer, left_semi, and left_anti.

Spark SQL Join()

Is there any difference between outer and full_outer? I suspect not, I suspect they are just synonyms for each other, but wanted to get clarity.

ZygD
  • 22,092
  • 39
  • 79
  • 102
jamiet
  • 10,501
  • 14
  • 80
  • 159

2 Answers2

7

There is no difference between outer and full_outer - they are the same. See the following answer for a demonstration: What are the various join types in Spark?

dric
  • 86
  • 2
  • 3
5

Spark v2.4.0 join code (the _ has been suppressed):

case "inner" => Inner
case "outer" | "full" | "fullouter" => FullOuter
case "leftouter" | "left" => LeftOuter
case "rightouter" | "right" => RightOuter
case "leftsemi" => LeftSemi
case "leftanti" => LeftAnti
case "cross" => Cross

So Spark really supports: Inner, FullOuter, LeftOuter, RightOuter, LeftSemi, LeftAnti, and Cross.

Quick example, given:

+---+-----+
| id|value|
+---+-----+
|  1|   A1|
|  2|   A2|
|  3|   A3|
|  4|   A4|
+---+-----+

and:

+---+-----+
| id|value|
+---+-----+
|  3|   A3|
|  4|   A4|
|  4| A4_1|
|  5|   A5|
|  6|   A6|
+---+-----+

You get:

OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+

FULL_OUTER JOIN

+----+-----+----+-----+
|  id|value|  id|value|
+----+-----+----+-----+
|null| null|   5|   A5|
|null| null|   6|   A6|
|   1|   A1|null| null|
|   2|   A2|null| null|
|   3|   A3|   3|   A3|
|   4|   A4|   4|   A4|
|   4|   A4|   4| A4_1|
+----+-----+----+-----+
jgp
  • 2,069
  • 1
  • 21
  • 40