-2

I’m comparing the data ingested in hive table with that of that source and storing the differences in mariadb There are no primary keys for the tables and would like to have a optimise solution and though I’ve used except method to check the difference I’m finding difficult in printing out the difference in the columns for the same row which are different.

  • 1
    might help check this - https://stackoverflow.com/questions/44338412/how-to-compare-two-dataframe-and-print-columns-that-are-different-in-scala – GRVPrasad Feb 26 '20 at 18:12
  • I checked the above link they are doing column wise comparison not row wise and i need to print out the rows which are different along with the column names – Avinash Sreethalam Feb 27 '20 at 07:57

1 Answers1

0

As far as I can think it's not possible to solve your problem in the absence of primary key as in that case each row of one DataFrame is potentially different than each row of the other DataFrame and practically you wouldn't want to report difference with each row of the other DataFrame.