I want to join 2 pyspark dfs, where df_template has all the columns and rows that I need in the output and df_proc has data in it for some (but not all) of the row/column combinations in df_template. The code I'm using is:
df_blend = df_template.join(df_proc, ["metro_area"],"left").select(df_template["*"])
But all this returns is the original df_template:
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| null| null| null| null| null| null|
| A10000502| null| null| null| null| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| null| null| null| null| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
This seems pretty basic but I just can't figure out how to get my desired result, any suggestions ??? Here is what I want the output to look like ...
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| 1455| 26| 19| 65| 38| null|
| A10000502| 654| 1876| 1950| 886| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| 774| 854| 1012| 271| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
For reference, here are the original dataframes. df_template
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501| null| null| null| null| null| null|
| A10000502| null| null| null| null| null| null|
| A10000503| null| null| null| null| null| null|
| A10000504| null| null| null| null| null| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
df_proc
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000502| 654| 1876| 1950| 886| null| null|
| A10000504| 774| 854| 1012| 271| null| null|
| Al0000501| 1455| 26| 19| 65| 38| null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+