0

I am trying to extract data from Db2 to spark using read.jdbc . i am unable to pass with UR string in the query.

How do we set isolation as UR in the spark jdbc read.

import json
#spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/user/db2jcc4.jar').getOrCreate()
jdbcUrl = "jdbc:db2://{0}:{1}/{2}".format("db2.1234.abcd.com", "3910", "DSN")
connectionProperties = {
  "user" : "user1",
  "password" : "password1",
  "driver" : "com.ibm.db2.jcc.DB2Driver",
  "fetchsize" : "100000"
}
pushdown_query = "(SELECT T6.COLUMN1, T6.COLUMN2 ,TO_DATE('07/11/2019 10:52:24', 'MM/DD/YYYY HH24:MI:SS') AS INSERT_DATE FROM DB1.T6 WITH UR ) ALIAS"
print(jdbcUrl)
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, column="COLUMN1", lowerBound=1, upperBound=12732076, numPartitions=5, properties=connectionProperties)

This is failing with error : com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-199, SQLSTATE=42601, SQLERRMC=UR;;FETCH , ) OFFSET LIMIT INTERSECT ORDER GROUP WHERE HAVING JOIN, DRIVER=4.13.80

if i remove the UR it is working. Is there a way to pass query with UR in spark jdbc read?

There is connection parameter in jdbc but this is mentioned only applied to writing isolationLevel The transaction isolation level, which applies to current connection. It can be one of NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE, corresponding to standard transaction isolation levels defined by JDBC's Connection object, with default of READ_UNCOMMITTED. This option applies only to writing. Please refer the documentation in java.sql.Connection.

will the below do the trick ?

connectionProperties = {
      "user" : "user1",
      "password" : "password1",
      "driver" : "com.ibm.db2.jcc.DB2Driver",
      "fetchsize" : "100000",
"isolationLevel" : "READ_UNCOMMITTED" 
    }
Rafa
  • 487
  • 7
  • 22

2 Answers2

1

According to DB2 documentation , while connecting to db2 in connection details we can pass defaultIsolationLevel=1 which means Uncommitted Reads. Check out the link: https://www.ibm.com/support/pages/how-set-isolation-level-db2-jdbc-database-connections

0

According to the Documentation and to this Blog the isolationLevel is ignored in a read action.

To be honest, I don't understand why, since the java.sql.connection setIsolationLevel sets a default for the whole connection and afaik the read does not set the isolationLevel by itself.

Nevertheless, here is offered a different approach.

So the following should work out for you:

#spark = SparkSession.builder.config('spark.driver.extraClassPath', '/home/user/db2jcc4.jar').getOrCreate()
jdbcUrl = "jdbc:db2://{0}:{1}/{2}".format("db2.1234.abcd.com", "3910", "DSN")
connectionProperties = {
  "user" : "user1",
  "password" : "password1",
  "driver" : "com.ibm.db2.jcc.DB2Driver",
  "fetchsize" : "100000"
}

df = spark.read.jdbc(url=jdbcUrl, table="DB1.T6", predicates=["1=1 WITH UR"], properties=connectionProperties).select("COLUMN1", "COLUMN2", ...)

I Used the 1=1 clause to make a valid where condition. This apporoch does look, like there must be a cleaner way, but it works fine

armfri
  • 73
  • 1
  • 3