0

I have a scenario where I am able to read data frame from Apache Drill using Spark SQL(Python) but I am not able to perform any joins.

Below is how the queries look like

def drillConnection(tableName):
    return spark.read.format("jdbc").options(url="<connection url>",driver="<drill driver>",dbtable="<table>",user="<un>",password="<pwd>").load()

df_query = '''SELECT COL1,COL2,COL3 FROM TABLE'''
# <We have another data frame read from a file. Lets name it DataF>
df = drillConnection(df_query)
# drillConnection is where we get the data as a data frame from drill
newDF = df.join(DataF,'COL1')
newDF.show()

After running the above piece of code, I am getting the below error

java.sql.SQLException: Failed to create prepared statement: PARSE ERROR: Encountered "\"" at line 1 col 22.

To be exact, the error at the query level where Spark internally converts looks like this

Query SELECT * FROM (SELECT "COL1" FROM (SELECT COL1,COL2,COL3 FROM TABLE)) LIMIT 0
                            ^

Kindly help me on this as it does not seem like I'm doing anything wrong. It's a basic operation and it works well when I run in Hive. Thanks.

PS: Please note that its not a duplicate of Spark SQL - load data with JDBC using SQL statement, not table name as the op there was looking to read the tables from a database. Where as here, I am able to extract the data from database and I am facing issue with respect to join. I am not doing anything like adding double quotes to the columns names etc. Its spark thats doing it internally and not me. The dummy error that I am seeing above is after spark did its own query and not something that I wrote. I am preforming a simple join operation after reading data from Drill. Hope this clarifies.

  • By default Drill uses back-ticks as quoting identifier. In your case you have double-quotes thus query submission fails. To change quoting identifier you can use system / session option or via jdbc string: **https://drill.apache.org/docs/lexical-structure/** – Arina Ielchiieva Jan 03 '18 at 15:09
  • You can use JDBCDialects to customize identifier quote. Refer to this question.https://stackoverflow.com/questions/35476076/integrating-spark-sql-and-apache-drill-through-jdbc You can override 'quoteIdentifier' method to return column name with back-ticks. – InfamousCoconut Jan 25 '18 at 08:14
  • @user8371915 - The question is NOT duplicate. Please unset the duplicate tag and maybe read the question again. – Abhinandan Dubey May 29 '18 at 20:25

0 Answers0