Use Jupyter notebook with pyspark in my cluster HDP

Asked Jan 19 '18 at 21:17

Active Jan 19 '18 at 21:48

Viewed 1,163 times

I have a cluster of 4 nodes where it's already installed Spark, I use Pyspark or spark-shell to launch spark and start programming.

I knew how to use Zepplin, but I would like to use jupyter instead as Programation interface (IDE) because it's more useful.

I read that I should export this 2 variable to my .bashrc to make it work:

export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

how can I use Pyspark with jupyter?

edited Jan 19 '18 at 21:48

asked Jan 19 '18 at 21:17

HISI

1

Possible duplicate of [Configuring Spark to work with Jupyter Notebook and Anaconda](https://stackoverflow.com/q/47824131/8371915) – Alper t. Turker Jan 19 '18 at 21:37
@user8371915 is it the same for hdp (in a cluster) – HISI Jan 19 '18 at 21:44
1

Both answers are generic and not specific to Anaconda Python distribution or Hadoop / Spark distribution. You should fine on HDP. Also [Apache Toree](https://toree.apache.org/) - mentioned in the comments. – Alper t. Turker Jan 19 '18 at 21:57
Setting `PYSPARK_DRIVER_PYTHON="jupyter"` is a really crappy "solution", *especially* in a cluster environment, where it will cause problems downstream; see my answer in the link posted above by @user8371915 – desertnaut Jan 19 '18 at 22:43
1

@desertnaut very useful answer, I think that the problem is fixed now, but I sill have some dark in what I did – HISI Jan 19 '18 at 23:03
1

@hisi thanx - upvotes are most welcome ;) – desertnaut Jan 19 '18 at 23:04

0 Answers0