0

I have Apache Spark installed on ubuntu at this path /home/mymachine/spark-2.1.0-bin-hadoop2.7 so I have to go to python directory, located under this directory, to be able using spark OR I can use it outside python directory with help from a library called findspark, however it seems I have to always init this library like this:

import findspark
findspark.init("/home/mymachine/spark-2.1.0-bin-hadoop2.7")

everytime I want to use findspark, which is not very effective. Is there anyway to init this library permanently?

At here it mentioned need to set a variable SPARK_HOME on .bash_profile and I did it, but no luck.

antonifs
  • 303
  • 1
  • 3
  • 14
  • hi, I had a similar issue on OSX. Meaning, i added SPARK_HOME to my .bash_profile and no luck. I had to `source .bash_profile` to resolve. – Bob Haffner Sep 23 '17 at 16:28
  • Hi @bob-haffner What do you mean with ```source .bash_profile``` to resolve? – antonifs Sep 23 '17 at 16:37
  • Hi HW, when you add an env var (eg SPARK_HOME) to .bash_profile you need to close and reopen your shell or do `source .bash_profile` so you can use it. Does that make sense? – Bob Haffner Sep 23 '17 at 16:42
  • I should note that's what i do on OSX. I'm not too familiar with ubuntu – Bob Haffner Sep 23 '17 at 16:51
  • Hi @bob-haffner, Yes, certainly. Should be work the same in Ubuntu, but this time still not working. – antonifs Sep 23 '17 at 17:18
  • Notebook or script or interpreter? And what happens when you do `os.environ.get('SPARK_HOME')` in your notebook/interpreter/script? I'm guessing nothing – Bob Haffner Sep 23 '17 at 17:26
  • Notebook and script. It actually has respone: ```/home/mymachine/spark-2.1.0-bin-hadoop2.7``` weird right? – antonifs Sep 23 '17 at 17:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155162/discussion-between-bob-haffner-and-helloworld). – Bob Haffner Sep 23 '17 at 17:33

1 Answers1

0

Add the following variables to your .bashrc file

export SPARK_HOME=/path/2/spark/folder
export PATH=$SPARK_HOME/bin:$PATH

then source .bashrc
If you wish run to pyspark with jupyter notebook, add these variables to .bashrc

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

again source .bashrc
Now if you run pyspark from shell, it will launch jupyter notebook server and pyspark will be availble on python kernels.

pauli
  • 4,191
  • 2
  • 25
  • 41