I would like to get differences between two dataframe but returning the row with the different fields only. For example, I have 2 dataframes as follow:
val DF1 = Seq(
(3,"Chennai", "rahman",9846, 45000,"SanRamon"),
(1,"Hyderabad","ram",9847, 50000,"SF")
).toDF("emp_id","emp_city","emp_name","emp_phone","emp_sal","emp_site")
val DF2 = Seq(
(3,"Chennai", "rahman",9846, 45000,"SanRamon"),
(1,"Sydney","ram",9847, 48000,"SF")
).toDF("emp_id","emp_city","emp_name","emp_phone","emp_sal","emp_site")
The only difference between these two dataframe is emp_city
and emp_sal
for the second row.
Now, I am using the except
function which gives me the entire row as follow:
DF1.except(DF2)
+------+---------+--------+---------+-------+--------+
|emp_id| emp_city|emp_name|emp_phone|emp_sal|emp_site|
+------+---------+--------+---------+-------+--------+
| 1|Hyderabad| ram| 9847| 50000| SF|
+------+---------+--------+---------+-------+--------+
However, I need the output to be like this:
+---------+--------+-----+
|emp_id| emp_city|emp_sal|
+------+---------+-------+
| 1|Hyderabad| 50000|
+------+---------+-------+
Which shows the different cells as well as emp_id
.
Edit : if there is change in column then it should appear if there is no change then it should be hidden or Null