What is the standard development process involving some kind of IDE for spark with python for
- Data exploration on the cluster
- Application development?
I found the following answers, which do not satisfy me:
a) Zeeplin/Jupiter notbooks running "on the cluster"
b)
- Install Spark and PyCharm locally,
- use some local files containing dummy data to develope locally,
- change references in the code to some real files on the cluster,
- execute script using spark-submit in the console on the cluster.
- source: https://de.hortonworks.com/tutorial/setting-up-a-spark-development-environment-with-python/
I would love to do a) and b) using some locally installed IDE, which communicates with the cluster directly, because I dislike the idea to create local dummy files and to change the code before running it on the cluster. I would also prefer an IDE over a notebook. Is there a standard way to do this or are my answers above already "best practice"?