Pyspark doesn't add suffix on dataframe joining when there are duplicate columns?

Asked Apr 03 '23 at 16:59

Active Apr 03 '23 at 16:59

Viewed 24 times

I need to join two Pyspark dataframes and there is one column of the same name in both (excluding the columns I am joining on).

So then it gives me an error when I do a select because it doesn't know which column I want. The documentation doesn't indicate any suffix, as you can do with Pandas. By default, Pandas will add suffix _x or _y.

Does Pyspark have no such suffix equivalent?

https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html

asked Apr 03 '23 at 16:59

Chuck

1,061
1
20
45

Pyspark doesn't add suffix on dataframe joining when there are duplicate columns?

0 Answers0