I did the following actions:
- loaded in a json as a spark dataframe
- analyzed data from (5) columns of this dataframe
- applied a function to the data extracted from these 5 columns (binned continuous values into 10 bins by percentile although I don't think the details of this matter)
- created a new dataframe using spark.createDataFrame, containing all of these new values with 5 completely different column names
- attempted a full outer join of the original dataframe with the new dataframe.
Because all of the columns in my synthesized dataframe have different names from the columns in the original dataframe, an outer join should be the same as simply concatenating the two dataframes along the column axis.
However, instead I receive this error:
AnalysisException: u'Detected implicit cartesian product for FULL OUTER join between logical plans\nUnion\n:- Project\n:
How do I resolve this? I simply want to concatenate the dataframes by column like in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html