Hi All I have 2 dataframes in i am comparing values of both the dataframe and based on value assigning value to one new dataframe. all the scenarios are working fine expect null fields comparision i.e. if in both the dataframe values are null then it should show as "varified" but its giving me as "not varified" I am sharing my dataframes data and code which i'm using and result of final dataframe below.
scala> df1.show()
+---+-----+---+--------+------+-------+
| id| name|age|lastname| city|country|
+---+-----+---+--------+------+-------+
| 1|rohan| 26| sharma|mumbai| india|
| 2|rohan| 26| sharma| null| india|
| 3|rohan| 26| null|mumbai| india|
| 4|rohan| 26| sharma|mumbai| india|
+---+-----+---+--------+------+-------+
scala> df2.show()
+----+------+-----+----------+------+---------+
|o_id|o_name|o_age|o_lastname|o_city|o_country|
+----+------+-----+----------+------+---------+
| 1| rohan| 26| sharma|mumbai| india|
| 2| rohan| 26| sharma| null| india|
| 3| rohan| 26| sharma|mumbai| india|
| 4| rohan| 26| null|mumbai| india|
+----+------+-----+----------+------+---------+
val df3 = df1.join(df2, df1("id") === df2("o_id"))
.withColumn("result", when(df1("name") === df2("o_name") &&
df1("age") === df2("o_age") &&
df1("lastname") === df2("o_lastname") &&
df1("city") === df2("o_city") &&
df1("country") === df2("o_country"), "Varified")
.otherwise("Not Varified")).show()
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
| id| name|age|lastname| city|country|o_id|o_name|o_age|o_lastname|o_city|o_country| result|
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
| 1|rohan| 26| sharma|mumbai| india| 1| rohan| 26| sharma|mumbai| india| Varified|
| 2|rohan| 26| sharma| null| india| 2| rohan| 26| sharma| null| india|Not Varified|
| 3|rohan| 26| null|mumbai| india| 3| rohan| 26| sharma|mumbai| india|Not Varified|
| 4|rohan| 26| sharma|mumbai| india| 4| rohan| 26| null|mumbai| india|Not Varified|
+---+-----+---+--------+------+-------+----+------+-----+----------+------+---------+------------+
I want that for id '2' also it should show as 'Varified'.but the city is null in both the column then its showing as 'Not Varified'. Can someone Please guide me how should i Modify my df3 query so it can check null also and for for id '2' also can show as 'Varified' in result column.