Set findspark.init() Permanently

Question

I have Apache Spark installed on ubuntu at this path /home/mymachine/spark-2.1.0-bin-hadoop2.7 so I have to go to python directory, located under this directory, to be able using spark OR I can use it outside python directory with help from a library called findspark, however it seems I have to always init this library like this:

import findspark
findspark.init("/home/mymachine/spark-2.1.0-bin-hadoop2.7")

everytime I want to use findspark, which is not very effective. Is there anyway to init this library permanently?

At here it mentioned need to set a variable SPARK_HOME on .bash_profile and I did it, but no luck.

hi, I had a similar issue on OSX. Meaning, i added SPARK_HOME to my .bash_profile and no luck. I had to `source .bash_profile` to resolve. — Bob Haffner, Sep 23 '17 at 16:28
Hi @bob-haffner What do you mean with ```source .bash_profile``` to resolve? — antonifs, Sep 23 '17 at 16:37
Hi HW, when you add an env var (eg SPARK_HOME) to .bash_profile you need to close and reopen your shell or do `source .bash_profile` so you can use it. Does that make sense? — Bob Haffner, Sep 23 '17 at 16:42
I should note that's what i do on OSX. I'm not too familiar with ubuntu — Bob Haffner, Sep 23 '17 at 16:51
Hi @bob-haffner, Yes, certainly. Should be work the same in Ubuntu, but this time still not working. — antonifs, Sep 23 '17 at 17:18
Notebook or script or interpreter? And what happens when you do `os.environ.get('SPARK_HOME')` in your notebook/interpreter/script? I'm guessing nothing — Bob Haffner, Sep 23 '17 at 17:26
Notebook and script. It actually has respone: ```/home/mymachine/spark-2.1.0-bin-hadoop2.7``` weird right? — antonifs, Sep 23 '17 at 17:30
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155162/discussion-between-bob-haffner-and-helloworld). — Bob Haffner, Sep 23 '17 at 17:33

score 0 · Answer 1 · answered Sep 26 '17 at 08:27

Add the following variables to your .bashrc file

export SPARK_HOME=/path/2/spark/folder
export PATH=$SPARK_HOME/bin:$PATH

then source .bashrc
If you wish run to pyspark with jupyter notebook, add these variables to .bashrc

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

again source .bashrc
Now if you run pyspark from shell, it will launch jupyter notebook server and pyspark will be availble on python kernels.

Set findspark.init() Permanently

1 Answers1