So, I have a PySpark program that runs fine with the following command:
spark-submit --jars terajdbc4.jar,tdgssconfig.jar --master local sparkyness.py
And yes its running on local mode and just executing on the master node.
I want to be able to launch my PySpark script though with just:
python sparkyness.py
So, I have added the following lines of code throughtout my PySpark script to facilitate that:
import findspark
findspark.init()
sconf.setMaster("local")
sc._jsc.addJar('/absolute/path/to/tdgssconfig.jar')
sc._jsc.addJar('/absolute/path/to/terajdbc4.jar')
This does not seem to be working though. Everytime I try to run the script with python sparkyness.py
I get the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o48.jdbc.
: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
What is the difference between spark-submit --jars
and sc._jsc.addJar('myjar.jar')
and what could be causing this issue? Do I need to do more than just sc._jsc.addJar()
?