WITH Clause in spark sql Context not supported

Question

I am trying to fetch records from mainframe table using spark sql context with the below query

data_config.db2_qry= SELECT A.E_No,A.E_Name FROM Scheema.Employee A WITH UR

but it is throwing the below error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-199, SQLSTATE=42601, SQLERRMC=WITH;HAVING WHERE GROUP ORDER INTERSECT MINUS EXCEPT UNION ) , FETCH, DRIVER=4.19.26

but if I run the same query in mainframe console directly it works fine.

How to use WITH clause in sql context of spark?

I am using spark version 2.4.0

I am retrieving the records like below

filt_cond = "(" + data_config.db2_qry + ") ref_id"

db2Df = sqlContext.read.format("jdbc").option("url", data_config.db2_url).option("driver", "com.ibm.db2.jcc.DB2Driver").option( "dbtable", filt_cond).option("user", data_config.db2_uname).option("password", data_config.db2_passwd).load()

score 2 · Answer 1 · answered Mar 16 '19 at 09:58

2

The issue is in query that sent down to Mainframe DB2, spark jdbc method choice used to push "WITH UR" needs change.

spark jdbc read method used here is

def jdbc(url: String, table: String, properties: Properties): DataFrame

and in this method say we pushing the following query to db2 sql engine

"select a, b, c, d from table where d is not null with UR as table" , it's not the same query pushed inside the Mainframe DB2 SQL engine. spark sends the sql as

select a, b, c from (select a, b, c from table where d is not null with UR) as table this is where trouble started.

if you want to see the same error for the sql in Mainframe SPUFI or QMF or with other tool, try running the constructed query by spark rather than what we wrote in code.

To overcome this issue on adding "WITH UR" syntax to SQL, instead of above spark jdbc method switch to following spark jdbc method that allows us to construct predicates.

 def jdbc(url: String, table: String, predicates: Array[String],
 connectionProperties: Properties): DataFrame

push the sql as ""select a, b, c, d from table as tbl"

with predicates= Array("d is not null with UR")

in this case the expected query is pushed down. Hope this helps you to get the direction to solve it.

here you can see more detail on spark jdbc read methods- Link

answered Mar 16 '19 at 09:58

Karthick

662
5
14

Hey @Karthick thanks for your reply..as I am using pyspark the jbdc definition looks like[http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.jdbc] . So I tried using below but it did not work predicates = "where A.E_No=323156 with ur " db2Df = .. "dbtable", filt_cond).option("predicates", predicates).load() – Biplab Mar 16 '19 at 11:55
So I tried using predicates = "where A.E_No=323156 with ur " db2Df = sqlContext.read.format("jdbc").option("url", data_config.db2_url).option("driver", "com.ibm.db2.jcc.DB2Driver").option( "dbtable", filt_cond).option("predicates", predicates).option("user", data_config.db2_uname).option("password", data_config.db2_passwd).load() – Biplab Mar 16 '19 at 11:55
"where" clause is built by spark, no need to add that clause in predicates and what is the error showing out ? also make sure you hit the right spark jdbc method. predicates is Array[String] where you can keep adding more predicates which also defines partition. – Karthick Mar 16 '19 at 12:17
It is not showing any error..it is retrieving all the records irrespective of condition ` predicates = ["A.E_No=323156 with ur "] ` I aded as an array now though – Biplab Mar 16 '19 at 12:34
i see, that means .option("predicates","blah blah blah") is not applied. so a simple select pushed is went down to db. This indicates you are not with right spark jdbc call. I think this talk on SO helps you - https://stackoverflow.com/questions/48677883/using-predicates-in-spark-jdbc-read – Karthick Mar 16 '19 at 12:42
1

thanks a lot @Karthick.I am able to make it work with the provided link. – Biplab Mar 25 '19 at 15:18

WITH Clause in spark sql Context not supported

1 Answers1

Linked