I need to use the left-anti join to pull all the rows that do not match but, the problem is that the left-anti join is not flexible in terms of selecting columns, because it will only ever allow me select columns from the left dataframe... and I need to keep some columns from the right dataframe as well. So I tried:
cols_to_keep = ['col1_left_df', 'col2_left_df', 'col3_left_df',
'col1_right_df', 'col2_right_df', 'col3_right_df']
non_matches = (
left_df.join(right_df, [left_df.['col1_left_df'] == right_df.['col1_right_df']], how = 'lefouter')
.filter(col('col1_left_df').isNull()) & col('col1_right_df').isNull())
.select(cols_to_keep)
)
This allows me to choose columns from both left and right dataframes and did not return nay errors. However, due to the size and both - the known and unknown complexity of the actual data - I am still in the process of checking if it worked as intended or not (which is taking me ages).
My question: is there an alternative way of replicating the left-anti join which would let me select columns from both left and right dataframes?