Is there a way to join
two Spark Dataframes
with different column names via 2 lists?
I know that if they had the same names in a list I could do the following:
val joindf = df1.join(df2, Seq("col_a", "col_b"), "left")
or if I knew the different column names I could do this:
df1.join(
df2,
df1("col_a") <=> df2("col_x")
&& df1("col_b") <=> df2("col_y"),
"left"
)
Since my method is expecting inputs of 2 lists which specify which columns are to be used for the join
for each DF, I was wondering if Scala Spark had a way of doing this?
P.S
I'm looking for something like a python pandas merge
:
joindf = pd.merge(df1, df2, left_on = list1, right_on = list2, how = 'left')