I am trying to connect teradata server through PySpark.
My CLI code is as below,
from pyspark.sql import SparkSession
spark=SparkSession.builder
.appName("Teradata connect")
.getOrCreate()
df = sqlContext.read
.format("jdbc")
.options(url="jdbc:teradata://xy/",
driver="com.teradata.jdbc.TeraDriver",
dbtable="dbname.tablename",
user="user1",password="***")
.load()
Which is giving error,
py4j.protocol.Py4JJavaError: An error occurred while calling o159.load. : java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
To resolve this I think, I need to add jar terajdbc4.jar
and `tdgssconfig.jar.
In Scala, to add jar we can use
sc.addJar("<path>/jar-name.jar")
If I use the same for PySpark, I am having error,
AttributeError: 'SparkContext' object has no attribute 'addJar'.
or
AttributeError: 'SparkSession' object has no attribute 'addJar'
How can I add jar terajdbc4.jar
and tdgssconfig.jar
?