I have two dataframes as follows:
df1 (reference data)
Tempe, AZ, USA
San Jose, CA, USA
Mountain View, CA, USA
New York, NY, USA
df2 (User entered data)
Tempe, AZ
Tempe, Arizona
San Jose, USA
San Jose, CA
Mountain View, CA
I would like to get a dataframe (df3) as following:
-------------------------------------------
|Tempe, AZ, USA | Tempe, Arizona |
|Tempe, AZ, USA | Tempe, AZ |
|San Jose, CA, USA | San Jose, CA |
|San Jose, CA, USA | San Jose, USA |
|Mountain View, CA, USA| Mountain View, CA|
-------------------------------------------
I already a User Defined Function :
isSameAs(str1: String, str2:String): Boolean{
......
}
that take two strings (user entered data and reference data) and tells me if they are a match or not.
I just need to find out the right way to implement map in Scala Spark SQL so that I get the the dataframe like df3.