I have Ubuntu 18.04.I am working with anaconda distribution for using jupyter notebooks and python.I want to install spark and pyspark to work with HDFS. I want to know the proper installation procedure for the same. Thank you
Asked
Active
Viewed 3,718 times
-2
-
I tried using conda installation and manual installation but i am getting confused between setting up the environmnet variables in .bashrc file – Alay Majmudar Sep 09 '18 at 11:58
1 Answers
2
conda install -c conda-forge pyspark
This allows you to install PySpark into your anaconda environment using the conda-forge channel. In order for it to work with Spark, just run your code on the Spark cluster. For more information, look here which has some references with using anaconda specifically with PySpark and Spark.

Neeyanth Kopparapu
- 138
- 5
-
What changes i have to make in .bashrc file for setting up any variables? – Alay Majmudar Sep 09 '18 at 11:56
-
The only ones I can imagine are ensuring that the PYTHONPATH and CONDA PATH variables are set properly - but if you can use python regularly with all the packages I don't see why that shouldn't be a problem – Neeyanth Kopparapu Sep 09 '18 at 16:13
-
Thanks a lot.It started working.Actually i had tried using the conda installation but i was using a more upgraded version of java which was not compatible with spark 2.3.1. – Alay Majmudar Sep 10 '18 at 03:52
-