0

I have a cluster of 4 nodes where it's already installed Spark, I use Pyspark or spark-shell to launch spark and start programming.

I knew how to use Zepplin, but I would like to use jupyter instead as Programation interface (IDE) because it's more useful.

I read that I should export this 2 variable to my .bashrc to make it work:

export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

how can I use Pyspark with jupyter?

HISI
  • 4,557
  • 4
  • 35
  • 51
  • 1
    Possible duplicate of [Configuring Spark to work with Jupyter Notebook and Anaconda](https://stackoverflow.com/q/47824131/8371915) – Alper t. Turker Jan 19 '18 at 21:37
  • @user8371915 is it the same for hdp (in a cluster) – HISI Jan 19 '18 at 21:44
  • 1
    Both answers are generic and not specific to Anaconda Python distribution or Hadoop / Spark distribution. You should fine on HDP. Also [Apache Toree](https://toree.apache.org/) - mentioned in the comments. – Alper t. Turker Jan 19 '18 at 21:57
  • Setting `PYSPARK_DRIVER_PYTHON="jupyter"` is a really crappy "solution", *especially* in a cluster environment, where it will cause problems downstream; see my answer in the link posted above by @user8371915 – desertnaut Jan 19 '18 at 22:43
  • 1
    @desertnaut very useful answer, I think that the problem is fixed now, but I sill have some dark in what I did – HISI Jan 19 '18 at 23:03
  • 1
    @hisi thanx - upvotes are most welcome ;) – desertnaut Jan 19 '18 at 23:04

0 Answers0