I have a Dataframe as shown below:
val df1 = Seq(
("EventId1", Some("GUID1"), Some("ID1"), None),
("EventId2", None, Some("ID1"), Some("Uid1")),
("EventId3", Some("GUID1"), None, None),
("EventId4", Some("GUID3"), Some("ID3"), None),
("EventId5", None, Some("ID3"), Some("Uid3"))
).toDF("EventId", "GUID", "WID", "SUid")
+--------+-----+----+----+
| EventId| GUID| WID|SUid|
+--------+-----+----+----+
|EventId1|GUID1| ID1|null|
|EventId2| null| ID1|Uid1|
|EventId3|GUID1|null|null|
|EventId4|GUID3| ID3|null|
|EventId5| null| ID3|Uid3|
+--------+-----+----+----+
The challenge is to harmonize the last 3 ID fields across EventIds. The expected result is :
+--------+-----+---+----+
| EventId| GUID|WID|SUid|
+--------+-----+---+----+
|EventId1|GUID1|ID1|Uid1|
|EventId2|GUID1|ID1|Uid1|
|EventId3|GUID1|ID1|Uid1|
|EventId4|GUID3|ID3|Uid3|
|EventId5|GUID3|ID3|Uid3|
+--------+-----+---+----+
Any idea how can be achieved in a Spark- efficient way?