-2

We are moving data from table1 to table2. I need to create a reconciliation report if the data in table1 exists in table2.

Example:

val df1 = """(select col1, col2, col3, col4 from table1)""" 
val df2 = """(select col21,col22,col23,c24 from table2)"""

Now I need to check if the data in table1 exists in table2 and write to a report if it is missing.

ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • How about an inner join between `df1` and `df2`? – ernest_k Feb 07 '20 at 15:39
  • Welcome to SO! I think you could improve your chances of getting an answer with a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – DCTID Feb 07 '20 at 16:18
  • @ernest_k I need the missing records from second table, the comparison needs to be attribute level. Inner join might not help me with that. – Sahi12089 Feb 07 '20 at 17:14
  • `leftanti` join? – blackbishop Feb 07 '20 at 17:44
  • Left anti works, but will it effect the performance if I write around 10 joins for 10 different columns. I have to compare every col in table 1 against col in table 2. table1.col1 vs table2.col1,table1.col2 vs table2.col2, ......,table1.coln vs table2.coln. – Sahi12089 Feb 07 '20 at 18:59
  • @Sahi12089 have you checked the solution? – venus Feb 10 '20 at 17:13

1 Answers1

0

Left anti join is the elegant way to filter rows that exist in dataframe1 but does not exist in dataframe2 by comparing one or more columns of two dataframes.
As you are not comfortable with the left anti join solution hence lets go ahead with an alternate way.

Let's assume we have two dataframes having common column names:

val DF1 = Seq(
  ("Ravi", 20),
  ("Kiran", 25),
  ("Gaurav", 30),
  ("Vinay", 35),
  ("Mahesh", 40)
).toDF("name", "age")
val DF2 = Seq(
  ("Ravi", 20),
  ("Mahesh", 40)
).toDF("name", "age")
DF1.except(DF2).show()

output screenshot

Also check a beautiful solution given by Tzach Zohar by using left-anti-join-in-spark

venus
  • 1,188
  • 9
  • 18