I have two dataframes with different columns and one of the dataframes has the row indexes as follows:
+------------+--------------+
| rec_id1| rec_id2|
+------------+--------------+
|rec-3301-org|rec-3301-dup-0|
|rec-2994-org|rec-2994-dup-0|
|rec-2106-org|rec-2106-dup-0|
|rec-3771-org|rec-3771-dup-0|
|rec-3886-org|rec-3886-dup-0|
| rec-974-org| rec-974-dup-0|
| rec-224-org| rec-224-dup-0|
|rec-1826-org|rec-1826-dup-0|
| rec-331-org| rec-331-dup-0|
|rec-4433-org|rec-4433-dup-0|
+------------+--------------+
+----------+-------+-------------+------+-----+-------+
|given_name|surname|date_of_birth|suburb|state|address|
+----------+-------+-------------+------+-----+-------+
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 0| 1| 1| 1.0|
| 0| 1.0| 1| 1| 1| 0.0|
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 0| 1| 1| 1.0|
| 0| 1.0| 1| 1| 1| 1.0|
| 0| 1.0| 1| 0| 1| 1.0|
+----------+-------+-------------+------+-----+-------+
I would like to merge the two pyspark dataframes into one such that the new dataframe is like this:
given_name surname ... state address
rec_id_1 rec_id_2 ...
rec-3301-org rec-3301-dup-0 0 1.0 ... 1 1.0
rec-2994-org rec-2994-dup-0 0 1.0 ... 1 1.0
rec-2106-org rec-2106-dup-0 0 1.0 ... 1 0.0
Assume same number of rows.