3

In earlier version of spark with I had two sql tables,

t1: (id, body)
t2: (id, name)

I could query them like:

spark.read.jdbc("t1 inner join t2 on t1.id = t2.id")
          .selectExpr("name", "body")

Which would generate the following query:

 select name, body from t1 inner join t2 on t1.id = t2.id

However with spark 2.3 I now get the error message:

org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data schema: `id`

despite the fact that I should never be loading that column into spark. It might appear that selects may no longer be getting pushed down into the database any more.

Is there any way around this issue?

0 Answers0