6

I have anaconda installed and also I have downloaded Spark 1.6.2. I am using the following instructions from this answer to configure spark for Jupyter enter link description here

I have downloaded and unzipped the spark directory as

~/spark

Now when I cd into this directory and into bin I see the following

SFOM00618927A:spark $ cd bin
SFOM00618927A:bin $ ls
beeline         pyspark         run-example.cmd     spark-class2.cmd    spark-sql       sparkR
beeline.cmd     pyspark.cmd     run-example2.cmd    spark-shell     spark-submit        sparkR.cmd
load-spark-env.cmd  pyspark2.cmd        spark-class     spark-shell.cmd     spark-submit.cmd    sparkR2.cmd
load-spark-env.sh   run-example     spark-class.cmd     spark-shell2.cmd    spark-submit2.cmd

I have also added the environment variables as mentioned in the above answer to my .bash_profile and .profile

Now in the spark/bin directory first thing I want to check is if pyspark command works on shell first.

So I do this after doing cd spark/bin

SFOM00618927A:bin $ pyspark
-bash: pyspark: command not found

As per the answer after following all the steps I can just do

pyspark 

in terminal in any directory and it should start a jupyter notebook with spark engine. But even the pyspark within the shell is not working forget about making it run on juypter notebook

Please advise what is going wrong here.

Edit:

I did

open .profile 

at home directory and this is what is stored in the path.

export PATH=/Users/854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Users/854319/spark/bin
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark
Community
  • 1
  • 1
Baktaawar
  • 7,086
  • 24
  • 81
  • 149
  • Did you follow step 8 of the answer? Adding the bin folder to the PATH environment variable? – rfkortekaas Aug 06 '16 at 00:48
  • After doing cd spark/bin, $ ./pyspark will work, did u tried this – Siddharth Kumar Aug 07 '16 at 06:59
  • @rfkortekaas Yes i followed step 8 of the answer. I have all those in the path still its not working – Baktaawar Aug 08 '16 at 06:13
  • Can you add the contents of PATH:'echo $PATH' – rfkortekaas Aug 08 '16 at 06:14
  • @rfkortekaas Hi check below /Users/i854319/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin Also I tried SANDHYALALKUMAR answer. It gave an error: No Java Runtime Installed even though I have installed it. Do i need to restart the computer to make it effective? – Baktaawar Aug 08 '16 at 06:17
  • @rfkortekaas check Edit. I added the contents of .profile file – Baktaawar Aug 08 '16 at 06:21
  • @rfkortekaas ok i set JDK up and now when I just do pyspark it opens up a jupyter notebook. I dont have to do ./pyspark.. Last thing- what if I just want to open the pyspark on console instead of opening a notebook. What command do we type then? – Baktaawar Aug 09 '16 at 17:40

3 Answers3

4

1- You need to set JAVA_HOME and spark paths for the shell to find them. After setting them in your .profile you may want to

source ~/.profile

to activate the setting in the current session. From your comment I can see you're already having the JAVA_HOME issue.

Note if you have .bash_profile or .bash_login, .profile will not work as described here

2- When you are in spark/bin you need to run

./pyspark

to tell the shell that the target is in the current folder.

shuaiyuancn
  • 2,744
  • 3
  • 24
  • 32
  • ok i set this up and now when I just do pyspark it opens up a jupyter notebook. I dont have to do ./pyspark.. Last thing- what if I just want to open the pyspark on console instead of opening a notebook. What command do we type then? – Baktaawar Aug 09 '16 at 17:40
  • You need to clear the settings of `PYSPARK_DRIVER_PYTHON` and `PYSPARK_DRIVER_PYTHON_OPTS`. – shuaiyuancn Aug 10 '16 at 10:35
3

Here's my environment vars, hope it will help you:

# path to JAVA_HOME
export JAVA_HOME=$(/usr/libexec/java_home)

#Spark
export SPARK_HOME="/usr/local/spark" #version 1.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

^^ Remove the Pyspark_driver_python_opts option if you don't want the notebook to launch, otherwise you can leave this out entirely and use it on your command line when you need it.

I have anaconda vars in another line to append to the PATH.

Max
  • 982
  • 10
  • 21
  • Helped particular string `export SPARK_HOME="/usr/local/spark" #version 1.6 `, however I've downloaded the source itself: `/Users/iamtodor/programming/tools/spark-3.2.0-bin-hadoop3.2` – iamtodor Dec 03 '21 at 14:02
1

For anyone who came here during or after MacOS Catalina, make sure you're establishing/sourcing variables in zshrc and not bash.

$ nano ~/.zshrc

# Set Spark Path
export SPARK_HOME="YOUR_PATH/spark-3.0.1-bin-hadoop2.7"
export PATH="$SPARK_HOME/bin:$PATH"

# Set pyspark + jupyter commands
export PYSPARK_SUBMIT_ARGS="pyspark-shell"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab' pyspark

$ source ~/.zshrc

$ pyspark # Automatically opens Jupyter Lab w/ PySpark initialized.

kevin_theinfinityfund
  • 1,631
  • 17
  • 18