Is there a difference between left-anti join and except in Spark in my implementation below?
Except when both DFs have 3 cols.
scala> val someDF5 = Seq(
| ("202003101750", "202003101700", 122),
| ("202003101800", "202003101700", 12),
| ("202003101750", "202003101700", 42),
| ("202003101810", "202003101700", 2),
| ("202003101810", "22222222", 222)
| ).toDF("number", "word", "value")
someDF5: org.apache.spark.sql.DataFrame = [number: string, word: string ... 1 more field]
scala> val someDF = Seq(
| ("202003101750", "202003101700",122),
| ("202003101800", "202003101700",12),
| ("202003101750", "202003101700",42)
| ).toDF("number", "word","value")
someDF: org.apache.spark.sql.DataFrame = [number: string, word: string ... 1 more field]
scala> someDF5.except(someDF).show()
+------------+------------+-----+
| number| word|value|
+------------+------------+-----+
|202003101810|202003101700| 2|
|202003101810| 22222222| 222|
+------------+------------+-----+
Left-Anti when 1 DF has 2 cols and another one has 3 cols
scala> val someDF4 = someDF.drop("value")
someDF4: org.apache.spark.sql.DataFrame = [number: string, word: string]
scala> someDF5.join(someDF4, Seq("number","word"), "left_anti").orderBy($"number".desc).show()
+------------+------------+-----+
| number| word|value|
+------------+------------+-----+
|202003101810|202003101700| 2|
|202003101810| 22222222| 222|
+------------+------------+-----+
The outputs match, and with left-anti I dont need the same number of cols in both tables. So, am I actually getting the same output using both except and left-anti join?