I have two dataframes as follows.
I want to add a new column to dataframe df_a
from dataframe df_b
column val_1
based on the condition df_a.col_p == df_b.id
df_a = sqlContext.createDataFrame([(1412, 31, 1), (2422, 21, 1), (4223, 22, 2), (
2244, 43, 1), (1235, 54, 1), (4126, 12, 5), (2342, 44, 1 )], ["idx", "col_n", "col_p"])
df_a.show()
+----+-----+-----+
| idx|col_n|col_p|
+----+-----+-----+
|1412| 31| 1|
|2422| 21| 1|
|4223| 22| 2|
|2244| 43| 1|
|1235| 54| 1|
|4126| 12| 5|
|2342| 44| 1|
+----+-----+-----+
df_b = sqlContext.createDataFrame([(1, 1, 1), (2, 1, 1), (3, 1, 2), (
4, 1, 1), (5, 2, 1), (6, 2, 2)], ["id", "val_1", "val_2"])
df_b.show()
+---+-----+-----+
| id|val_1|val_2|
+---+-----+-----+
| 1| 1| 1|
| 2| 1| 1|
| 3| 1| 2|
| 4| 1| 1|
| 5| 2| 1|
| 6| 2| 2|
+---+-----+-----+
Expected output
+----+-----+-----+-----+
| idx|col_n|col_p|val_1|
+----+-----+-----+-----+
|1412| 31| 1| 1|
|2422| 21| 1| 1|
|4223| 22| 2| 1|
|2244| 43| 1| 1|
|1235| 54| 1| 1|
|4126| 12| 5| 2|
|2342| 44| 1| 1|
+----+-----+-----+-----+
My code
cond = (df_a.col_p == df_b.id)
df_a_new = df_a.join(df_b, cond, how ='full').withColumn('val_new', F.when(cond, df_b.val_1))
df_a_new = df_a_new.drop(*['id', 'val_1', 'val_2'])
df_a_new = df_a_new.filter(df_a_new.idx. isNotNull())
df_a_new.show()
How can I get the proper output as expected result with correct index order?