0

So I have source query in Sybase something like this

query = "select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id"

I am trying to fetch the data from the source query into spark using jdbc. It seems like I have to wrap my query in parenthesis & give it an alias to read it in spark like this

query = "(select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id) foo)"

Then read in spark dataframe as :

val src_df = spark.read
  .format("jdbc")
  .option("url", "$SYBASE_JDBC_URL")
  .option("dbtable", query )
  .option("user", "$USERNAME")
  .option("password", "$PASSWORD")
  .load()

It gives me the following error :

Exception in thread "main" com.sybase.jdbc4.jdbc.SybSQLException: An ORDER BY clause is not allowed in a derived table.

So I changed my query to remove order by to something like this

query = "(select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id) foo)"

And I get another error like this:

Exception in thread "main" java.sql.SQLException: JZ0P8: The RSMDA Column Type Name you requested is unknown.  This is a SAP internal error; please report it to technical support.

However if I run a simple scala script to run the query, it works fine.

  query = "select * from Table_A A LEFT JOIN TABLE_B B ON A.id = B.id order by id"
  val statement = connection.createStatement()
  val resultSet: java.sql.ResultSet = statement.executeQuery(query)

Similar query works fine with other databases like oracle with spark.

Is there a way I can change my query to work with Spark? If not, can I convert java.sql.ResultSet to dataframe as an alternative? Also I am looking for solution that scale up with large data.

Thanks

2shar
  • 101
  • 1
  • 11

1 Answers1

0

let question 1. Sybase db is ASA or ASE? 2. try set list columns after select ... . and show result

mvasyliv
  • 1,214
  • 6
  • 10