While working on Spark(pyspark), I have successfully integrated Anaconda (created a new virtual env) and jupyter notebook with it. This means that I could start jupyter notebook server and run spark related code from jupyter notebook under a virtual env. The spark job will be executed in cluster work nodes which also installed anaconda and configured to be executed in the same virtual env. Everything works totally fine, I could even install third party python libraries in virtual env of all the nodes and call functions from the libraries.
While trying to save the dataframe to mysql database, I find some tutorial for this purpose, however, seems the only way to add JDBC dependency is related with spark-submit command as below:
bin/spark-submit --jars external/mysql-connector-java-5.1.40-bin.jar
/path_to_your_program/spark_database.py
I'm wondering how could I integrate this jdbc jar dependency into Anaconda's virtual env and Jupyter notebook so that I could still leverage jupyter notebook to test spark code?
Thanks a lot for your response.