0

I have two tables, one with values and another with lookup values. I want to replace table 1 with values from table 2 but haven't found an elegant way of doing this.

Table 1 (main)

name       age       country       last_visited
 x1        a1          UK            2016
 x2        a3          US            2017
 x3        a1          China         2015

Table 2 (lookup)

code      value    date_updated
 x1        Dom       2017
 a1        25        2017
 x2        Peter     2017
 x3        Alice     2017
 a3        26        2017

Final Table (What I want)

name       age       country       last_visited
 Dom       25          UK            2016
 Peter     26          US            2017
 Alice     25          China         2015

I have tried joining table 1 to table 2 but then this involved me later replacing / dropping columns and I have to do this individually for each column. I have only given a small portion of columns where i have lookup values above but in reality I have about 8 columns that contain lookup values in the same table. How do I achieve this?

 val unsanitised_data = table1.join(table2, table1.col("name") === table2.col("code"), "left").drop("last_visited", "name", "code")

Now I need to rename "value" to name and do the same thing for age and other columns again. Is this the best way to do this?

zero323
  • 322,348
  • 103
  • 959
  • 935
CodeGeek123
  • 4,341
  • 8
  • 50
  • 79

1 Answers1

3

A simple join and select functions should solve your issue

df1.join(df2, df1("name")=== df2("code"), "left").select($"value".as("name"), $"age", $"country", $"last_visited")
  .join(df2,df1("age")=== df2("code"), "left").select($"name", $"value".as("age"), $"country", $"last_visited")
  .show(false)

Edited

If you have too many columns and doesn't need columns of second table then you can do

df1.join(df2, df1("name")=== df2("code"), "left").withColumn("name", col("value")).drop(df2.columns: _*)
  .join(df2,df1("age")=== df2("code"), "left").withColumn("age", col("value")).drop(df2.columns: _*)
  .show(false)
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97