0

I need to join two Pyspark dataframes and there is one column of the same name in both (excluding the columns I am joining on).

So then it gives me an error when I do a select because it doesn't know which column I want. The documentation doesn't indicate any suffix, as you can do with Pandas. By default, Pandas will add suffix _x or _y.

Does Pyspark have no such suffix equivalent?

https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.DataFrame.join.html

Chuck
  • 1,061
  • 1
  • 20
  • 45

0 Answers0