3

I'm developing a web application retrieving data from data lake, the data is stored in HDFS and I want to use pyspark to perform some analysis. In other words we have a script within ipython notebook and we want to use it with Django. I see that pyspark is also available at pypi, so I installed it with pip and the same script is imported as .py file from notebook is running fine, when I run it as python myscript.py it works fine. Hence, it should also work fine if I import that script within Django. So, is it the correct method, or I will have to run spark-submit myscript.py? I want to use Spark in cluster mode.

Faizan Ali
  • 973
  • 2
  • 16
  • 32

0 Answers0