So I have source query in Sybase something like this
query = "select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id"
I am trying to fetch the data from the source query into spark using jdbc. It seems like I have to wrap my query in parenthesis & give it an alias to read it in spark like this
query = "(select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id) foo)"
Then read in spark dataframe as :
val src_df = spark.read
.format("jdbc")
.option("url", "$SYBASE_JDBC_URL")
.option("dbtable", query )
.option("user", "$USERNAME")
.option("password", "$PASSWORD")
.load()
It gives me the following error :
Exception in thread "main" com.sybase.jdbc4.jdbc.SybSQLException: An ORDER BY clause is not allowed in a derived table.
So I changed my query to remove order by to something like this
query = "(select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id) foo)"
And I get another error like this:
Exception in thread "main" java.sql.SQLException: JZ0P8: The RSMDA Column Type Name you requested is unknown. This is a SAP internal error; please report it to technical support.
However if I run a simple scala script to run the query, it works fine.
query = "select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id"
val statement = connection.createStatement()
val resultSet: java.sql.ResultSet = statement.executeQuery(query)
Similar query works fine with other databases like oracle with spark.
Is there a way I can change my query to work with Spark? If not, can I convert java.sql.ResultSet to dataframe as an alternative? Also I am looking for solution that scale up with large data.
Thanks