I have two DF's(railroadGreaterFile
, railroadInputFile
).
I want to drop records from railroadGreaterFile
if data in MEMBER_NUM
column from railroadGreaterFile
is matching the data in MEMBER_NUM
column from railroadInputFile
Below is what i used:
val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check"))
val railroadGreaterNotInput = railroadGreaterFile
.join(columnrailroadInputFile, Seq("MEMBER_NUM"), "left")
.filter($"check".isNull)
.drop($"check")
Doing above, records are dropped, however i witnessed railroadGreaterNotInput
's schema is combination of my DF1
and DF2
so when I try to write the railroadGreaterNotInput
's data to file, it gives me below error
org.apache.spark.sql.AnalysisException: Reference 'GROUP_NUM' is ambiguous, could be: GROUP_NUM#508, GROUP_NUM#72
What should i be doing so that railroadGreaterNotInput
would only contain fields from railroadGreaterFile
DF?