I have a scenario where I am able to read data frame from Apache Drill using Spark SQL(Python) but I am not able to perform any joins.
Below is how the queries look like
def drillConnection(tableName):
return spark.read.format("jdbc").options(url="<connection url>",driver="<drill driver>",dbtable="<table>",user="<un>",password="<pwd>").load()
df_query = '''SELECT COL1,COL2,COL3 FROM TABLE'''
# <We have another data frame read from a file. Lets name it DataF>
df = drillConnection(df_query)
# drillConnection is where we get the data as a data frame from drill
newDF = df.join(DataF,'COL1')
newDF.show()
After running the above piece of code, I am getting the below error
java.sql.SQLException: Failed to create prepared statement: PARSE ERROR: Encountered "\"" at line 1 col 22.
To be exact, the error at the query level where Spark internally converts looks like this
Query SELECT * FROM (SELECT "COL1" FROM (SELECT COL1,COL2,COL3 FROM TABLE)) LIMIT 0
^
Kindly help me on this as it does not seem like I'm doing anything wrong. It's a basic operation and it works well when I run in Hive. Thanks.
PS: Please note that its not a duplicate of Spark SQL - load data with JDBC using SQL statement, not table name as the op there was looking to read the tables from a database. Where as here, I am able to extract the data from database and I am facing issue with respect to join. I am not doing anything like adding double quotes to the columns names etc. Its spark thats doing it internally and not me. The dummy error that I am seeing above is after spark did its own query and not something that I wrote. I am preforming a simple join operation after reading data from Drill. Hope this clarifies.