Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

Question

I have a project code in Python Notebook and it ran all good when Spark was hosted in Bluemix.

We are running the following code to connect to Netezza (on premises) which worked fine in Bluemix.

VT =  sqlContext.read.format('jdbc').options(url='jdbc:netezza://169.54.xxx.x:xxxx/BACC_PRD_ISCNZ_GAPNZ',user='XXXXXX', password='XXXXXXX', dbtable='GRACE.CDVT_LIVE_SPARK', driver='org.netezza.Driver').load()'

However, after migration to DatascienceExperience, we are getting the following error. I have established the secure gateway and its all working fine, but this code is not running. I think the issue is with the Netezza driver. If it is the case, is there a way we can explicitly import the class/driver so the above code can be executed. Please help how we can address the issue.

Error Message:


/usr/local/src/spark20master/spark/python/pyspark/sql/utils.py in  deco(*a, **kw)
61     def deco(*a, **kw):
62         try:
---> 63             return f(*a, **kw)
64         except py4j.protocol.Py4JJavaError as e:
65             s = e.java_exception.toString()

/usr/local/src/spark20master/spark/python/lib/py4j-0.10.3-src.zip /py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317                 raise Py4JJavaError(
318                     "An error occurred while calling {0}{1} {2}.\n".
--> 319                     format(target_id, ".", name), value)
320             else:
321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o212.load.
: java.lang.ClassNotFoundException: org.netezza.driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:607)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:844)
at java.lang.ClassLoader.loadClass(ClassLoader.java:823)
at java.lang.ClassLoader.loadClass(ClassLoader.java:803)
at  org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at    org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC    onnectionFactory$1.apply(JdbcUtils.scala:49)
at  org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createC    onnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)

Chris Snow · Answer 1 · 2017-01-22T07:28:55.557

1

You can install a jar file by adding a cell with an exclamation mark that runs a unix tool to download the file, in this example wget:

!wget https://some.public.host/yourfile.jar -P  ${HOME}/data/libs

After downloading the file you will need to restart your kernel.

Note this approach assumes your jar file is publicly available on the Internet.

edited Jan 22 '17 at 07:28

answered Jan 21 '17 at 00:05

Chris Snow

23,813
35
144
309

0 down vote accept Thanks for your suggestions. I could install the jar file 'nzjdbc3.jar' successfully as suggested by Chris and also tried the commands provided by Roland and I could see the imported Jar file. I have restarted the Kernel after doing so. However, I still get the same error message. Is there any step I am missing? Like after installing the Jar files, should I execute any other command for my code to run? Please advise. Thank you. – Sagar KSK Jan 25 '17 at 13:45

score 0 · Answer 2 · answered Jan 23 '17 at 08:16

Notebooks in Bluemix and notebooks in DSX (Data Science Experience) currently use the same backend, so they have access to the same pre-installed drivers. Netezza isn't among them. As Chris Snow pointed out, users can install additional JARs and Python packages into their service instances.

You probably created a new service instance for DSX, and did not yet install the user JARs and packages that the old one had. It's a one-time setup, therefore easy to forget when you've been using the same instance for a while. Execute these commands in a Python notebook of the old instance on Bluemix to check for user-installed things:

!ls -lF ~/data/libs
!pip freeze

Then install the missing things into your new instance on DSX.

0 down vote accept Thanks for your suggestions. I could install the jar file 'nzjdbc3.jar' successfully as suggested by Chris and also tried the commands provided by Roland and I could see the imported Jar file. I have restarted the Kernel after doing so. However, I still get the same error message. Is there any step I am missing? Like after installing the Jar files, should I execute any other command for my code to run? Please advise. Thank you. — Sagar KSK, Jan 25 '17 at 13:46

score 0 · Answer 3 · answered Apr 18 '17 at 21:00

There is another way to connect to Netezza using ingest connector which is by default enabled in DSX.

http://datascience.ibm.com/docs/content/analyze-data/python_load.html

from ingest import Connectors

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

NetezzaloadOptions = { 
                 Connectors.Netezza.HOST              : 'hostorip',
                 Connectors.Netezza.PORT              : 'port',
                 Connectors.Netezza.DATABASE          : 'databasename',
                 Connectors.Netezza.USERNAME          : 'xxxxx',
                 Connectors.Netezza.PASSWORD          : 'xxxx',
                 Connectors.Netezza.SOURCE_TABLE_NAME         : 'tablename'}

NetezzaDF = sqlContext.read.format("com.ibm.spark.discover").options(**NetezzaloadOptions).load()

NetezzaDF.printSchema()

NetezzaDF.show()

Thanks,

Charles.

Hi Charles, I was able to connect to Netezza with injest connector on bluemix. However of late I am getting this error. 'NoneType' object has no attribute 'Netezza''. Any suggestion you have to fix this? Thank you. 'AttributeErrorTraceback (most recent call last) in () 3 from ingest import Connectors 4 NetezzaloadOptions = { ----> 5 Connectors.Netezza.HOST : '169.54.229.7', 6 Connectors.Netezza.PORT : '16587', 7 AttributeError: 'NoneType' object has no attribute 'Netezza' ' — Sagar KSK, Jun 19 '17 at 08:43

Netezza Drivers not available in Spark (Python Notebook) in DataScienceExperience

3 Answers3