Let's say we join 2 dataframes in pyspark, each one has its alias and they have the same columns:
joined_df = source_df.alias("source").join(target_df.alias("target"), \
(col("source.A_column") == col("target.A_column")), 'outer')
How can I get a list of column names of joined_df dataframe with aliases from dataframes, something like:
[source.A_column, target.A_column, source.B_column, target.B_column, source.C_column, target.C_column]
It shows in case of analysis exception, so there is obviously the information stored somewhere, but I didn't found a way, how to show it somehow without the exception...
What I tried:
- Get the names as above with some direct method or property, but there is no such thing like df.columns_with_alias property
- Get list of columns from Dataframe as instances of Column class (because this class saves the alias info), but df.columns just gives you strings... And I found no other way.
Is there any way, how to show this column names?