Ric S's answer is the best solution in some situation like below.
From Spark 1.3.0, you can use join with 'left_anti' option:
df1.join(df2, on='key_column', how='left_anti')
These are Pyspark APIs, but I guess there is a correspondent function in Scala too.
This is very useful in some situation. Suppose. I have two dataframes
dataframe1
-----------------------------------------
|id | category |
-----------------------------------------
|1 | [{"type":"sport","name","soccer"}] |
-----------------------------------------
dataframe2
-----------------------------------------------------------------------------
|id | category |
-----------------------------------------------------------------------------
|1 | [{"type":"sport","name","soccer"}, {"type":"player","name":"ronaldo"}] |
-----------------------------------------------------------------------------
here it is not possible to use exceptAll() or substract()