We have multiple Dataframes.
One of the dataframe is the primary one, which is joined with the other dataframes using left-outer joins. All these dataframes are joined on 4 columns (say col1,col2,col3,col4).
To reduce data shuffle, currently we are re-partitioning all the dataframes on the 4 join columns, and then joining these dataframes (left-outer).
Is there a better way to join/repartition these dataframes, so that the data shuffle is minimum?
Thanks