1

I have an employee table with few records as below:

+---+----+-------+  
| Id | Name | Address|  
+---+----+-------+  
|  1 |  AA |    Hyd |  
|  2 |  BB |   Bglr |  
|  3 |  CC |    Chn |  
|  4 |  DD |   Pune |  
+---+----+-------+

Now, I got a new employee table.I have to join both tables(old+new) and then wants to do below tasks 1.remove duplicates records
2.replace old records with updated records
3.add new records to my old Employee table

my new table as below:

+---+----+-------+  
| Id|Name|Address|  
+---+----+-------+  
|  1 |  AA |    Hyd |  
|  2 |  BB |   Bglr |  
|  3 |  CC |     US |  
|  4 |  DD |    IND |  
|  5 |  EE |    Hyd |  
|  6 |  FF |    Chn |  
+---+----+-------+

Please help me out, I want to do this using Spark, scala DataFrame. Thanks in advance.

Prashant
  • 702
  • 6
  • 21
  • and reading the docs is no option? I mean joining dataframes and using distinct is clearlied discussed and exemplified there – UninformedUser May 29 '19 at 07:35
  • and adding records means just creating another dataframe and use union of both given that dataframes are immutable – UninformedUser May 29 '19 at 07:37
  • pls provide code to replace existing records with updated records – Madhu Telemedia May 29 '19 at 17:45
  • use the search function and read existing answers: https://stackoverflow.com/questions/32357774/scala-how-can-i-replace-value-in-dataframes-using-scala – UninformedUser May 29 '19 at 17:55
  • Assume I am having thousends of records, i dont want to work with each single record to update( because, once i join 2 tables my records more than 2000 so its difficult to check each updated record to replace old record), just i wanna run a query and which need to be replace all old records with updated record and new records should add to table and finally duplicate records will delete pls share me code.. pls help me I'm new to spark an scala – Madhu Telemedia Jun 05 '19 at 06:49
  • as I said, just write a query that basically creates a new dataframe given that daframes are immutable. There is no "replace" nor "add" function, create new dataframes and then use the union of them. And removing duplicates is trivial, read the docs and find the keyword `distinct` ... – UninformedUser Jun 05 '19 at 12:39

0 Answers0